Java Stream API: Mastering Collectors

2024-09-20 skanto Comments 0 Comment

Java에서 Stream을 이용할 때 그 값이 하나의 값이든, Collection이든 또는 배열이든간에 최종 결과값을 산출하게 된다. Java에서 Stream은 Collector들을 통과하면서 어떤 형태의 Container로 변환된다. Java Stream API는 다양 형태의 Collector들을 제공하며 이 글에서 이런 Collector들을 하나씩 살펴보고자 한다.

Stream을 Collection으로 변형시키는 것은 광범위한 토픽이라 할 수 있다. 따라서 다양한 툴들이 존재하고 다양한 방법으로 활용하거나 무수한 조합으로 사용할 수 있으므로 언제, 어떻게 사용하는지에 대해 모두 이해하는 것은 현실성이 떨어진다. 활용방법에 대해 좀 더 쉽게 접근할 수 있도록 어떻게 소개할 지 시나리오를 제시하고자 한다.

가장 이해하기 쉬운 방법으로 다이어그램을 활용하는 것을 생각해 볼 수 있다. 다이어그램은 다양한 Stream 구조를 조직화하는데 도움이 되고 향후에 특정 업무에 알맞은 툴을 선정하는데에 도움을 줄 수 있다. 하지만 더 좋은 방법이 있다면 그것을 활용해도 좋다.

Java에서 Stream을 Collect할 수 있는 방법은 다양하다. Stream 인터페이스에서 제공하는 method를 직접 사용하는 방법도 있고 특화된 Collector를 반환해 주는 Collectors클래스의 method를 활용하는 방법도 있다. 이들 Collector들은 Collection인터페이스, Map 인터페이스를 구현하는 객체에 element를 축적할 수 있거나 심지어 String 객체에 직접 축적해 나갈 수 있다.

Collector들은 partitionBy와 groupBy와 같은 연산을 이용하여 데이터를 Map 객체로 collect할 때 특히 강력해 질 수 있다. Downstream collector들은 어떤 타입의 Collection이든간에 적용할 수는 있지만 Map객체로 collect하여 더욱 정제된 연산을 수행할 때 가장 유용하다. 자, 그럼 이런 collector들을 하나씩 자세하게 살펴보자

Stream Methods

toList()

Stream을 Collection으로 변환하는 가장 단순한 방법은 toList 메소드를 활용하는 것이다.

default List<T> toList() {...}

이 메소드는 Java 16부터 사용 가능하다.

List<Integer> example1 = Stream.of(1, 2, 3)
         .toList();
 // [1, 2, 3]
 // example1.add(4); // -- throws UnsupportedOperationException

이 메소드의 반환 값은 unmodifiable list이다. 따라서 이 값에 새로운 element를 추가하는 것과 같이 변경을 가할 경우 UnsupportedOperationException이 발생한다.

IntStream, LongStream, DoubleStream과 같은 Numeric Stream 인터페이스는 이 method를 제공하지 않는다.

toArray()

toArray메소드를 이용하면 element들을 Array로 collect할 수 있다.

Object[] toArray();

이 메소드는 아래와 같이 사용된다.

Object[] example2 = Stream.of(1, 2, 3)
        .toArray();
// [1, 2, 3]

Generic Array들은 런타임 상황에서 인스턴스화가 될 수 없기 때문에 이 메소드는 object array(Object[])를 반환한다. 또한 이 메소드는 generation function을 파라미터로 받을 수 있는 Overload된 버전도 있다.

<A> A[] toArray(IntFunction<A[]> generator);

이때 Generator function은 배열의 크기(size) 그리고 이와 동일한 size의 Array를 반환한다는 것을 의미하는 integer를 전달 받는다. 이런 일련의 과정들은 array 생성자 reference를 이용하면 간결하게 표현할 수 있다.

Integer[] example3 = Stream.of(1, 2, 3)
        .toArray(Integer[]::new);
// [1, 2, 3]

각 Numeric stream 인터페이스는 Stream의 타입과 동일한 타입의 Array를 반환하는 toArray 메소드를 제공한다.

int[] example4 = IntStream.of(1, 2, 3)
        .toArray();
// [1, 2, 3]

long[] example5 = LongStream.of(1L, 2L, 3L)
        .toArray();
// [1, 2, 3]

double[] example6 = DoubleStream.of(1.1, 2.2, 3.3)
        .toArray();
// [1.1, 2.2, 3.3]

Collect methods

Stream 인터페이스는 두 개의 overload된 collect 메소드를 제공한다. 그 중 첫 번째 메소드는 아래의 형태를 갖는다.

<R, A> R collect(Collector<? super T, A, R> collector);

그리고 두 번쨰 메소드는 아래와 같다.

<R> R collect(Supplier<R> supplier,
              BiConsumer<R, ? super T> accumulator,
              BiConsumer<R, R> combiner);

첫 번째 메소드는 Collector를 파라미터로 받는 반면 두 번째 메소드는 Supplier와 두 개의 BiConsumer 객체를 받는다. 이 글의 첫 번째 파트는 첫 번째 메소드의 사용법에 대해 설명하고 두 번째 방법은 나중에 설명하고자 한다.

Collectors

Java Stream API의 인터페이스인 첫 번째 collection 메소드는 Collector를 파라미터로 받는다. API는 쉽게 적용할 수 있도록 Collectors 클래스를 제공하며 이 클래스가 제공하는 다양한 method들은 Collector를 반환한다.

Collectors.toCollection()

이 메소드는 supplier 타입의 파라미터를 가진다.

public static <T, C extends Collection<T>>
Collector<T, ?, C> toCollection(Supplier<C> collectionFactory) {...}

이 메소드는 Collection 인터페이스를 구현하는 어떤 형태의 collection이든 Stream을 Collection으로 collect할 때 사용할 수 있다.

ArrayList<Integer> example7 = Stream.of(1, 2, 3)
        .collect(Collectors.toCollection(ArrayList::new));
// [1, 2, 3]

Set<Integer> example8 = Stream.of(1, 2, 3)
        .collect(Collectors.toCollection(HashSet::new));
// [1, 2, 3]

ArrayList<Integer> example9List = new ArrayList<>();
example9List.add(0);
ArrayList<Integer> example9 = Stream.of(1, 2, 3)
        .collect(Collectors.toCollection(() -> example9List));
// [0, 1, 2, 3]

Collectors.toList()

이 메소드는 이름에서 보듯 Stream의 element들을 List로 취합해 주는 직관적인 collector이다.

List<Integer> example10 = Stream.of(1, 2, 3)
        .collect(Collectors.toList());
// [1, 2, 3]

Collectors.toUnmodifiableList()

unmodifiable 리스트를 반환하기 때문에 Stream의 toList() 메소드를 직접 호출하는 것과 유사하게 동작한다.

List<Integer> example11 = Stream.of(1, 2, 3)
        .collect(Collectors.toUnmodifiableList());
// [1, 2, 3]

Collectors.toSet() 과 Collectors.toUnmodifiableSet()

이들 메소드는 앞에서 설명한 두 메소드와 유사하게 동작하지만 반환되는 결과는 List가 아니라 Set이다.

Set<Integer> example12 = Stream.of(1, 2, 3)
        .collect(Collectors.toSet());
// [1, 2, 3]

Set<Integer> example13 = Stream.of(1, 2, 3)
        .collect(Collectors.toUnmodifiableSet());
// [1, 2, 3]

Collectors.collectingAndThen()

collectingAndThen()은 Java Stream API에서 특별한 collector로 two-step collection과정을 거친다. 첫 번쨰는 toList() 또는 toSet()과 같이 다른 collector를 이용하여 element들을 collect하고 그 다음으로 결과 값에 대해 finishing transformation function을 적용한다. 이는 데이터를 collect한 후 바로 다음으로 unmodifiable collection으로 만든다거나 데이터를 custom 객체로 변환, 또는 추가적인 연산 과정과 같은 후속처리를 하고자 할 경우에 유용하게 활용할 수 있다. 이는 stream collection 로직에서 후처리(post-processing) 단계를 추가해 주는 강력한 방법이다.

List<Integer> example14 = Stream.of(1, 2, 3)
        .collect(Collectors.collectingAndThen(
                Collectors.toList(), Collections::unmodifiableList));
// [1, 2, 3]

class CustomClass {
    private List<Integer> streamOutput;
    CustomClass(List<Integer> streamOutput) {
        this.streamOutput = new ArrayList<>(streamOutput);
    }
    @Override
    public String toString() {
        return "CustomClass{" +
                "streamOutput=" + streamOutput +
                '}';
    }
}

CustomClass example15 = Stream.of(1, 2, 3)
        .collect(Collectors.collectingAndThen(
                Collectors.toList(), CustomClass::new));
// CustomClass{streamOutput=[1, 2, 3]}

Collectors.teeing()

Java 12에서 소개된 collector는 하나의 Stream을 두 파트로 분리하고 각 파트는 서로 다른 독립적인 collector를 활용하여 각자 처리한 다음 각각의 결과는 merging function을 이용하여 통합시키도록 한다. 이는 한 번에 Stream의 총합과 평균값을 계산하는 것과 같이 병렬 처리 방법으로 reduction을 수행하고자 할 경우 유용하다. 이렇게 하면 여러 과정의 Collection 처리를 한 번의 Stream 연산으로 통합시켜 줌으로 성능을 향상시킬 수 있다.

String example16 = Stream.of(1, 2, 3).collect(
        Collectors.teeing(
                Collectors.summingInt(Integer::intValue), // First collector: sums all elements
                Collectors.averagingInt(Integer::intValue), // Second collector: calculates the average
                (sum, average) -> 
                        String.format("Sum: %d, Avg: %.2f", sum, average) // Merge the results
        )
);
// Sum: 6, Avg: 2,00

Joining collectors

프로그래밍할 때 String을 이용해야할 다양한 시나리오가 있다. Stream API는 CharSequence 타입의 입력 element를 하나의 immutable String으로 concatenation할 수 있도록 특별히 고안된 Collector를 제공한다. 이런 Collector들 중 첫 번째는 어떠한 파라미터도 받지 않는다.

public static Collector<CharSequence, ?, String> joining() {...}

이 Collector는 여러 문자열들을 하나의 문자열로 연결시키는 역할을 한다.

String example17 = Stream.of("String1", "String2", "String3")
        .collect(Collectors.joining());
// String1String2String3

또 다른 Collector는 CharSequence를 변수로 받으며 이 변수는 문자열을 합칠 때 delimiter로 활용된다.

Collector<CharSequence, ?, String> joining(CharSequence delimiter) {...}

아래와 같이 활용할 수 있다.

String example18 = Stream.of("String1", "String2", "String3")
        .collect(Collectors.joining("-delimiter-"));
// String1-delimiter-String2-delimiter-String3

마지막으로 소개할 Collector는 세 개의 CharSequence 파라미터를 받는다. 각각 delimiter, prefix, suffix이다.

public static Collector<CharSequence, ?, String> joining(CharSequence delimiter,
                                                         CharSequence prefix,
                                                         CharSequence suffix) {...}

앞애서 설명한 Collector와 유사하게 첫 번째 파라미터는 delimiter로 사용되고 두 번째와 세 번째 파라미터는 각각 prefix와 suffix로 활용된다. prefix는 통합된 문자열의 시작 부분에 추가되고 suffix는 끝부분에 추가된다.

String example19 = Stream.of("String1", "String2", "String3")
        .collect(Collectors.joining("-delimiter-", "prefix", "suffix"));
// prefixString1-delimiter-String2-delimiter-String3suffix

이들 Collector에 사용되는 파라미터에 null이 전달될 경우 NullPointerException이 발생하므로 주의를 기울여야 한다.

Collectors to map

Element를 하나의 Map으로 취합하는 Collector들은 일반적으로 표준 Collection으로 collect하는 Collector들 보다 더 복잡하다. Collectors 클래스는 map collector를 생성하는 3개의 overload된 메소드를 제공한다. 첫 번째 메소드는 아래와 같은 형태를 가진다.

public static <T, K, U>
Collector<T, ?, Map<K,U>> toMap(Function<? super T, ? extends K> keyMapper,
                                Function<? super T, ? extends U> valueMapper) {...}

이 메소드는 2개의 Function 파라미터를 받으며 첫 번째 파라미터는 key를 매핑하며 두 번째 파라미터는 value를 매핑시킨다.

Map<Integer, Integer> example20 = Stream.of(1, 2, 3)
        .collect(Collectors.toMap(k -> k, v -> v * 10));
// .collect(Collectors.toMap(Function.identity(), v -> v * 10));
// {1=10, 2=20, 3=30}

위 샘플에서 Stream의 각 element들은 key로 사용되었으며 이에 상응하는 value는 해당 element에 10을 곱한 값이 된다. key를 매핑하기 위해 Function.identity() 메소드를 사용할 수도 있다. 이 메소드는 내부적으로 동일한 연산을 수행한다.

첫 번째 toMap 메소드를 관찰해 보면 Stream에 중복된 element가 있을 경우 어떻게 처리될 지 궁금할 것이다. 이런 경우가 발생하면 IllegalStateException을 발생시켜 key가 중복되었음을 알려준다.

Map<Integer, Integer> example21 = Stream.of(1, 1, 3)
        .collect(Collectors.toMap(k -> k, v -> v * 10));
// IllegalStateException: Duplicate key 1

이 문제를 해결하기 위해 toMap 메소드의 이후 버전은 BinaryOperator 타입의 파라미터를 추가하고 있다. 이렇게 하면 통합 방법을 지정하게 함으로써 중복된 키를 처리한다.

public static <T, K, U>
Collector<T, ?, Map<K,U>> toMap(Function<? super T, ? extends K> keyMapper,
                                Function<? super T, ? extends U> valueMapper,
                                BinaryOperator<U> mergeFunction) {...}

예를 들면 아래의 코드에서 처럼 중복된 키들의 값들을 합산할 수 있다.

Map<Integer, Integer> example22 = Stream.of(1, 1, 3)
        .collect(Collectors.toMap(k -> k,
                v -> v * 10,
                (v1, v2) -> v1 + v2));
// {1=20, 3=30} {1=(1*10) + (1*10), 3=3*10}

마지막 method는 추가적인 Supplier 파라미터를 포함한다. 이를 통해 collect하고자 정확한 map 구현방법을 지정할 수 있다.

public static <T, K, U, M extends Map<K, U>>
Collector<T, ?, M> toMap(Function<? super T, ? extends K> keyMapper,
                         Function<? super T, ? extends U> valueMapper,
                         BinaryOperator<U> mergeFunction,
                         Supplier<M> mapFactory) {...}

예를 들어, LinkedHashMap 으로 collect하고자 한다면 아래와 같이 구현할 수 있다.

LinkedHashMap<Integer, Integer> example23 = Stream.of(1, 1, 3)
        .collect(Collectors.toMap(k -> k,
                v -> v * 10,
                (v1, v2) -> v1 + v2,
                LinkedHashMap::new));
// {1=20, 3=30} {1=(1*10) + (1*10), 3=3*10}

Partitioning by collectors

Java Stream에서 Collectors.partitioningBy는 predicate에 기반하여 Stream을 두 개의 파티션으로 분리시킨다는 점에서 유용한 collector들중 하나이다. 이 메소드는 Map<Boolean, List<T>>를 반환하는데 이떄 true값의 key는 predicate과 매칭하는 element들을 가지며 false값의 키는 그렇지 못한 element들을 갖는다. 이 Collector는 조건에 따라 두 개의 분리된 그룹으로 카테고리를 나누고자 할 경우 유용하게 사용할 수 있다.

이 메소드는 2개의 overload된 버전을 가진다. 첫 번째는 predicate 타입의 파라미터 한 개를 가진다.

public static <T>
Collector<T, ?, Map<Boolean, List<T>>> partitioningBy(Predicate<? super T> predicate) {...}

이 메소드는 예를 들어 숫자가 짝수인지 홀수인지에 따라 Stream을 분할하고자 할 경우 아래와 같이 활용될 수 있다.

Map<Boolean, List<Integer>> example24 = Stream.of(1, 2, 3)
        .collect(Collectors.partitioningBy(el -> el % 2 == 0));
// {false=[1, 3], true=[2]}

이 샘플에서 stream은 짝수 List와(true 키) 홀수 List(false 키) 두 그룹으로 분리되었다.

두 번째 버전은 Collector타입의 추가 파라미터를 받으며 이 Collector는 각 파티션의 element들를 처리한다. 이 Collector는 각 파티션 내의 모든 Element에 대해 추가적인 취합(aggregation)와 변환(transformation)을 수행할 수 있다. 예를 들어 element들을 list, set 형태로 collect하거나 심지어 요약(summary)연산을 수행하고자 할 경우에도 이 메소드를 적용할 수 있다.

public static <T, D, A>
Collector<T, ?, Map<Boolean, D>> partitioningBy(Predicate<? super T> predicate,
                                                Collector<? super T, A, D> downstream) {...}

이 메소드는 element들을 두 그룹으로 분리하는 것뿐만 아니라 각 그룹을 대상으로 추가적인 downstream collection 또는 변환(transformation) 연산을 적용함으로써 보다 더 복잡한 파티션을 수행할 수 있도록 한다.

Map<Boolean, Set<Integer>> example25 = Stream.of(1, 2, 3, 4, 5, 6)
        .collect(Collectors.partitioningBy(
                el -> el % 2 == 0,  // Predicate to partition by even or odd
                Collectors.toSet()   // Downstream collector to collect into a set
        ));
// {false=[1, 3, 5], true=[2, 4, 6]}

아래 샘플은 각 파티션에서 최대값의 Element를 찾아내는 방법을 보여준다.

Map<Boolean, Optional<Integer>> example26 = Stream.of(1, 2, 3, 4, 5, 6)
        .collect(Collectors.partitioningBy(
                el -> el % 2 == 0,  
                Collectors.maxBy(Integer::compare)
        ));
// {false=Optional[5], true=Optional[6]}

Grouping By Collectors and Downstream Collectors

Downstream Collector는 grouped 또는 partitioned 데이터를 대상으로 추가적인 연산을 수행하기 위해 Collectors.groupingBy() 또는 Collectors.partitioningBy() 와 같이 사용된다. 이들 Collector들은 그룹핑된 데이터가 취합(aggregate), 변형(transform), reduce되는 과정을 Customize할 수 있도록 한다.

앞에서 언급한 튜토리얼은 Collectors.filtering(), Collectors.mapping(), Collectors.flatMapping(), Collectors.couting(), Collectors.maxBy(), Collectors.minBy()와 같은 것들을 포함해서 이런 Collector들을 활용한 다양한 샘플들을 제공한다. 이런 형태의 샘플들을 계속 반복하는 것보다 이 튜토리얼을 좀 더 심층적으로 파헤쳐보는 것을 강력하게 권해본다.

앞서 언급한 것처럼 이제부터 두 번째 collect 메소드에 대해 다시 살펴보도록 한다.

<R> R collect(Supplier<R> supplier,
              BiConsumer<R, ? super T> accumulator,
              BiConsumer<R, R> combiner);

이 메소드는 Java Stream에서 element들을 List, Set, Map과 같이 mutable container로 누적(accumulate)하도록하는 mutable reduction 연산이다.

다음의 샘플은 Stream<Integer>을 List<Integer>로 collect하기 위해 이 메소드를 어떻게 사용하는지를 보여준다.

List<Integer> example27 = Stream.of(1, 2, 3).collect(
        // Supplier: Provides a new ArrayList
        ArrayList::new,
        // Accumulator: Adds each element to the ArrayList
        List::add,
        // Combiner: Merges two lists (useful for parallel processing)
        List::addAll
);
// [1, 2, 3]

다음은 이 메소드를 활용하는 또 다른 샘플이다. 이번에는 List가 아니라 Map으로 Element들을 Collect하는 방법을 보여준다.

Map<Integer, Integer> example28 = Stream.of(1, 2, 3).collect(
        // Supplier: Creates a new HashMap
        HashMap::new,
        // Accumulator: Adds each number and its square to the Map
        (map, number) -> map.put(number, number * number),
        // Combiner: Merges two maps (useful for parallel processing)
        Map::putAll
);
// {1=1, 2=4, 3=9}

마치며…

Stream API는 Java 개발자들이 일반적으로 사용하는 것보다 더 많은 기능들을 제공함으로써 data stream을 collect하고 manipulation하는데 필요한 광범위하고 심층적인 툴셋을 제시한다. 하지만 필요한 곳이 있어야 이런 강력한 툴이 유용하다는 것을 잊지 않았으면 한다. 이 글은 이런 Stream API가 가진 능력들을 보여줌으로써 여러분들이 필요로 하는 곳에 적절히 잘 사용할 수 있도록 하는 것을 목표로 한다.

Supiami

The Hidden Life of TREES