Processing Data in Memory

The Stream API is probably the second most important feature added to Java SE 8, after the lambda expressions. In a nutshell, the Stream API is about providing an implementation of the well known map-filter-reduce algorithm to the JDK.

map-filter-reduce:

List<Sale> sales = ...; // this is the list of all the sales
int amountSoldInMarch = 0;
for (Sale sale: sales) {
    if (sale.getDate().getMonth() == Month.MARCH) {
        amountSoldInMarch += sale.getAmount();
    }
}
System.out.println("Amount sold in March: " + amountSoldInMarch);

map:通过get取值,将部分字段映射到新数据(select字段)

filter:根据if判断过滤部分数据(where条件)

reduce:聚合,求和(聚合函数)

简而言之,相当于写一段SQL:

select sum(amount)
from Sales
where extract(month from date) = 3;

看看是如何从原始代码转换为Stream API的:

List<City> cities = ...;

int sum = 0;
for (City city: cities) {
    int population = city.getPopulation();
    if (population > 100_000) {
        sum += population;
    }
}

System.out.println("Sum = " + sum);

假设Collection有这几个方法:

int sum = cities.map(city -> city.getPopulation())
                .filter(population -> population > 100_000)
                .sum();

为什么Collection不提供这些方法呢?拆分为每一步:

Collection<Integer> populations         = cities.map(city -> city.getPopulation());
Collection<Integer> filteredPopulations = populations.filter(population -> population > 100_000);
int sum                                 = filteredPopulations.sum();

假如有1000个city,那么中间数据也是Collection,就会产生很多冗余的中间数据。而for循环却不存在这个问题,因为它不会存储中间数据。虽然Collection提供方法能让代码看起来更好理解,但却会导致大量的冗余数据。所以不得不设计一套Stream API来支持map-filter-reduce。

Stream<City> streamOfCities         = cities.stream();
Stream<Integer> populations         = streamOfCities.map(city -> city.getPopulation());
Stream<Integer> filteredPopulations = populations.filter(population -> population > 100_000);
int sum = filteredPopulations.sum(); // in fact this code does not compile; we'll fix it later

The streams created in this code, streamOfCitiespopulations and filteredPopulations must all be empty objects.

It leads to a very important property of streams:

A stream is an object that does not store any data.

Using streams is about creating pipelines of operations. A pipeline is made of a series of method calls on a stream. Each call produces another stream. Then at some point, a last call produces a result.

Adding Intermediate Operations

collect()

Stream本身不会存储数据,通过collect存储为List:

List<String> strings = List.of("one", "two", "three", "four");
Function<String, Integer> toLength = String::length;
Stream<Integer> ints = strings.stream()
                              .map(toLength);
List<String> strings = List.of("one", "two", "three", "four");
List<Integer> lengths = strings.stream()
                               .map(String::length)
                               .collect(Collectors.toList());
System.out.println("lengths = " + lengths);
lengths = [3, 3, 5, 4]

一些方法

contact()

连接流

List<Integer> list0 = List.of(1, 2, 3);
List<Integer> list1 = List.of(4, 5, 6);
List<Integer> list2 = List.of(7, 8, 9);

// 1st pattern: concat
List<Integer> concat = 
    Stream.concat(list0.stream(), list1.stream())
          .collect(Collectors.toList());

// 2nd pattern: flatMap
List<Integer> flatMap =
    Stream.of(list0.stream(), list1.stream(), list2.stream())
          .flatMap(Function.identity())
          .collect(Collectors.toList());

System.out.println("concat  = " + concat);
System.out.println("flatMap = " + flatMap);
concat  = [1, 2, 3, 4, 5, 6]
flatMap = [1, 2, 3, 4, 5, 6, 7, 8, 9]

连接流,推荐使用flatMap()

With the flatmap pattern, you just create a single stream to hold all your streams and do the flatmap. The overhead is much lower.

concat produces a SIZED stream, whereas flatmap does not.

Creating Streams

前面我们看到Collection的stream()方法可以创建流,此外还有很多其他方式创建流:

  • a vararg argument;
  • a supplier;
  • a unary operator, that generates the next element from the previous one;
  • a builder;
  • the characters of a string;
  • the lines of a text file;
  • the elements created by splitting a string of characters with a regular expressions;
  • a random variable, that can create a stream of random numbers.
Iterator<String> iterator = ...;

long estimateSize = 10L;
int characteristics = 0;
Spliterator<String> spliterator = Spliterators.spliterator(strings.iterator(), estimateSize, characteristics);

boolean parallel = false;
Stream<String> stream = StreamSupport.stream(spliterator, parallel);

空流:

Stream<String> empty = Stream.empty();
List<String> strings = empty.collect(Collectors.toList());

System.out.println("strings = " + strings);

Creating a Stream from a Vararg or an Array

Stream<Integer> intStream = Stream.of(1, 2, 3);
List<Integer> ints = intStream.collect(Collectors.toList());

System.out.println("ints = " + ints);
String[] stringArray = {"one", "two", "three"};
Stream<String> stringStream = Arrays.stream(stringArray);
List<String> strings = stringStream.collect(Collectors.toList());

System.out.println("strings = " + strings);

Creating a Stream from a Supplier

Stream<String> generated = Stream.generate(() -> "+");
List<String> strings = 
        generated
           .limit(10L)
           .collect(Collectors.toList());

System.out.println("strings = " + strings);

Creating a Stream from a UnaryOperator and a Seed

Stream<String> iterated = Stream.iterate("+", s -> s + "+");
iterated.limit(5L).forEach(System.out::println);

Creating a Stream from a Range of Numbers

String[] letters = {"A", "B", "C", "D"};
List<String> listLetters =
    IntStream.range(0, 10)
             .mapToObj(index -> letters[index % letters.length])
             .collect(Collectors.toList());
System.out.println("listLetters = " + listLetters);

Creating a Stream of Random Numbers

Random random = new Random(314L);
List<Integer> randomInts = 
    random.ints(10, 1, 5)
          .boxed()
          .collect(Collectors.toList());
System.out.println("randomInts = " + randomInts);

Creating a Stream from the Characters of a String

Java SE 10

String sentence = "Hello Duke";
List<String> letters =
    sentence.chars()
            .mapToObj(codePoint -> (char)codePoint)
            .map(Object::toString)
            .collect(Collectors.toList());
System.out.println("letters = " + letters);

Creating a Stream from the Lines of a Text File

Path log = Path.of("/tmp/debug.log"); // adjust to fit your installation
try (Stream<String> lines = Files.lines(log)) {
    
    long warnings = 
        lines.filter(line -> line.contains("WARNING"))
             .count();
    System.out.println("Number of warnings = " + warnings);
    
} catch (IOException e) {
    // do something with the exception
}

Creating a Stream from a Regular Expression

String sentence = "For there is good news yet to hear and fine things to be seen";

Pattern pattern = Pattern.compile(" ");
Stream<String> stream = pattern.splitAsStream(sentence);
List<String> words = stream.collect(Collectors.toList());

System.out.println("words = " + words);

Creating a Stream with the Builder Pattern

Stream.Builder<String> builder = Stream.<String>builder();

builder.add("one")
       .add("two")
       .add("three")
       .add("four");

Stream<String> stream = builder.build();

List<String> list = stream.collect(Collectors.toList());
System.out.println("list = " + list);

Creating a Stream on an HTTP Source

// The URI of the file
URI uri = URI.create("https://www.gutenberg.org/files/98/98-0.txt");

// The code to open create an HTTP request
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder(uri).build();


// The sending of the request
HttpResponse<Stream<String>> response = client.send(request, HttpResponse.BodyHandlers.ofLines());
List<String> lines;
try (Stream<String> stream = response.body()) {
    lines = stream
        .dropWhile(line -> !line.equals("A TALE OF TWO CITIES"))
        .takeWhile(line -> !line.equals("*** END OF THE PROJECT GUTENBERG EBOOK A TALE OF TWO CITIES ***"))
        .collect(Collectors.toList());
}
System.out.println("# lines = " + lines.size());

Reducing a Stream

Compute a reduction by just providing a binary operator that operates on only two elements. This is how the reduce() method works in the Stream API.

Stream<Integer> ints = Stream.of(0, 0, 0, 0);

int sum = ints.reduce(10, (a, b) -> a + b);
System.out.println("sum = " + sum);

Adding a Terminal Operation

In fact, you should use this reduce() method as a last resort, only if you have no other solution.

要想reduce stream,还有其他更多方法,比如count()、sum()等。

count()

Collection<String> strings =
        List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten");

long count =
        strings.stream()
                .filter(s -> s.length() == 3)
                .count();
System.out.println("count = " + count);

forEach()

Stream<String> strings = Stream.of("one", "two", "three", "four");
strings.filter(s -> s.length() == 3)
       .map(String::toUpperCase)
       .forEach(System.out::println);

collect()

Stream<String> strings = Stream.of("one", "two", "three", "four");

List<String> result = 
    strings.filter(s -> s.length() == 3)
           .map(String::toUpperCase)
           .collect(Collectors.toList());

max() min()

Stream<String> strings = Stream.of("one", "two", "three", "four");
String longest =
     strings.max(Comparator.comparing(String::length))
            .orElseThrow();
System.out.println("longest = " + longest);

findFirst() findAny()

Collection<String> strings =
        List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten");

String first =
    strings.stream()
           // .unordered()
           // .parallel()
           .filter(s -> s.length() == 3)
           .findFirst()
           .orElseThrow();

System.out.println("first = " + first);

allMatch() anyMatch() noneMatch()

Collection<String> strings =
    List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten");

boolean noBlank  = 
        strings.stream()
               .allMatch(Predicate.not(String::isBlank));
boolean oneGT3   = 
        strings.stream()
               .anyMatch(s -> s.length() == 3);
boolean allLT10  = 
        strings.stream()
               .noneMatch(s -> s.length() > 10);
        
System.out.println("noBlank = " + noBlank);
System.out.println("oneGT3  = " + oneGT3);
System.out.println("allLT10 = " + allLT10);

Finding the Characteristics

ORDERED

The order in which the elements of the stream are processed matters.

DISTINCT

There are no doubles in the elements processed by that stream.

NONNULL

There are no null elements in that stream.

SORTED

The elements of that stream are sorted.

SIZED

The number of elements this stream processes is known.

SUBSIZED

Splitting this stream produces two SIZED streams.

Collection<String> stringCollection = List.of("one", "two", "two", "three", "four", "five");

Stream<String> strings = stringCollection.stream().sorted();
Stream<String> filteredStrings = strings.filtered(s -> s.length() < 5);
Stream<Integer> lengths = filteredStrings.map(String::length);
Collection<String> stringCollection = List.of("one", "two", "two", "three", "four", "five");

Stream<String> strings = stringCollection.stream().distinct();
Stream<String> filteredStrings = strings.filtered(s -> s.length() < 5);
Stream<Integer> lengths = filteredStrings.map(String::length);

Using a Collector

List<Integer> numbers =
IntStream.range(0, 10)
         .boxed()
         .collect(Collectors.toList());
System.out.println("numbers = " + numbers);
Set<Integer> evenNumbers =
IntStream.range(0, 10)
         .map(number -> number / 2)
         .boxed()
        .collect(Collectors.toSet());
System.out.println("evenNumbers = " + evenNumbers);
LinkedList<Integer> linkedList =
IntStream.range(0, 10)
         .boxed()
         .collect(Collectors.toCollection(LinkedList::new));
System.out.println("linked listS = " + linkedList);

couting

Collection<String> strings = List.of("one", "two", "three");

long count = strings.stream().count();
long countWithACollector = strings.stream().collect(Collectors.counting());

System.out.println("count = " + count);
System.out.println("countWithACollector = " + countWithACollector);

joining

String joined = 
    IntStream.range(0, 10)
             .boxed()
             .map(Object::toString)
             .collect(Collectors.joining(", "));

System.out.println("joined = " + joined);

partitioningBy

Collection<String> strings =
    List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
            "ten", "eleven", "twelve");

Map<Boolean, List<String>> map =
    strings.stream()
           .collect(Collectors.partitioningBy(s -> s.length() > 4));

map.forEach((key, value) -> System.out.println(key + " :: " + value));

groupingBy

Collection<String> strings =
    List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
            "ten", "eleven", "twelve");

Map<Integer, List<String>> map =
    strings.stream()
           .collect(Collectors.groupingBy(String::length));

map.forEach((key, value) -> System.out.println(key + " :: " + value));

groupingBy + counting

Collection<String> strings =
        List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
                "ten", "eleven", "twelve");

Map<Integer, Long> map =
    strings.stream()
           .collect(
               Collectors.groupingBy(
                   String::length, 
                   Collectors.counting()));

map.forEach((key, value) -> System.out.println(key + " :: " + value));
3 :: 4
4 :: 3
5 :: 3
6 :: 2

groupingBy + joining

Collection<String> strings =
        List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
                "ten", "eleven", "twelve");

Map<Integer, String> map =
        strings.stream()
                .collect(
                        Collectors.groupingBy(
                                String::length,
                                Collectors.joining(", ")));
map.forEach((key, value) -> System.out.println(key + " :: " + value));
3 :: one, two, six, ten
4 :: four, five, nine
5 :: three, seven, eight
6 :: eleven, twelve

toMap

Collection<String> strings =
    List.of("one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
            "ten", "eleven", "twelve");

Map<Integer, String> map =
    strings.stream()
            .collect(
                    Collectors.toMap(
                            element -> element.length(),
                            element -> element, 
                            (element1, element2) -> element1 + ", " + element2));

map.forEach((key, value) -> System.out.println(key + " :: " + value));
3 :: one, two, six, ten
4 :: four, five, nine
5 :: three, seven, eight
6 :: eleven, twelve
  1. element -> element.length() is the key mapper.
  2. element -> element is the value mapper.
  3. (element1, element2) -> element1 + ", " + element2) is the merge function, called with the two elements that have generated the same key.

Parallelizing Streams

int parallelSum = 
    IntStream.range(0, 10)
             .parallel()
             .sum();

参考资料:

The Stream API https://dev.java/learn/api/streams/