Configure final reduce phase threads for heavy aggreagtion functions #14662

xiangfu0 · 2024-12-16T02:00:05Z

Add a new query option: numThreadsForFinalReduce to allow customize the number of threads per aggregate/reduce call.

This will significantly reduce the execution time of aggregation groupby, where there are many groups and each group final reduce is very costly like funnel functions.

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java

bziobrowski · 2024-12-16T11:34:41Z

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java

-        Object[] values = record.getValues();
-        for (int i = 0; i < numAggregationFunctions; i++) {
-          int colId = i + _numKeyColumns;
-          values[colId] = _aggregationFunctions[i].extractFinalResult(values[colId]);


I think it'd make sense to either :

put an upper limit on _numThreadsForFinalReduce (e.g. 2 or 3* Runtime.getRuntime().availableProcessors()) or

change the variable to a boolean flag enableParallelFinalReduce and use a sensible number of task
to prevent using excessive number of futures or various error modes, e.g.
if _numThreadsForFinalReduce is Integer.MAX_VALUE then chunkSize is going to be negative.

If shared thread pool is overwhelmed by running tasks it might be good to use current thread not only to wait but also task processing, stealing tasks until there's nothing left and only then waiting for futures to finish.

If shared thread pool is overwhelmed by running tasks it might be good to use current thread not only to wait but also task processing, stealing tasks until there's nothing left and only then waiting for futures to finish.

Potentially, and this can be done transparently by configuring the executor's rejected execution handler to CallerRunsPolicy. However, beware if the executor, which does non-blocking work, is sized to the number of available processors, then if the thread pool is overwhelmed, it means the available CPUs are overwhelmed too. Performing reductions on the caller thread would only lead to excessive context switching and it might be better, from a global perspective, for the task to wait for capacity to be available.

bziobrowski · 2024-12-16T12:08:53Z

pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java

@@ -232,6 +232,12 @@ public static Integer getGroupTrimThreshold(Map<String, String> queryOptions) {
    return uncheckedParseInt(QueryOptionKey.GROUP_TRIM_THRESHOLD, groupByTrimThreshold);
  }


Would it be possible to show that final reduce is parallelized in explain output ?

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java

kishoreg · 2024-12-17T18:45:49Z

can we do this automatically if the keys > X and for specific aggregation functions like funnel etc?

xiangfu0 · 2025-01-01T15:10:56Z

can we do this automatically if the keys > X and for specific aggregation functions like funnel etc?

put some heuristic logic here.

pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java

pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/ArrayTest.java

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java

codecov-commenter · 2025-01-18T17:09:15Z

Codecov Report

Attention: Patch coverage is 55.69620% with 35 lines in your changes missing coverage. Please review.

Project coverage is 63.73%. Comparing base (59551e4) to head (bcb01ca).
Report is 1617 commits behind head on master.

Files with missing lines	Patch %	Lines
...org/apache/pinot/core/data/table/IndexedTable.java	38.00%	28 Missing and 3 partials ⚠️
...pinot/core/plan/maker/InstancePlanMakerImplV2.java	55.55%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #14662      +/-   ##
============================================
+ Coverage     61.75%   63.73%   +1.98%     
- Complexity      207     1469    +1262     
============================================
  Files          2436     2708     +272     
  Lines        133233   151490   +18257     
  Branches      20636    23389    +2753     
============================================
+ Hits          82274    96551   +14277     
- Misses        44911    47683    +2772     
- Partials       6048     7256    +1208

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`63.70% <55.69%> (+1.99%)`	⬆️
java-21	`63.63% <55.69%> (+2.00%)`	⬆️
skip-bytebuffers-false	`63.71% <55.69%> (+1.96%)`	⬆️
skip-bytebuffers-true	`63.61% <55.69%> (+35.88%)`	⬆️
temurin	`63.73% <55.69%> (+1.98%)`	⬆️
unittests	`63.73% <55.69%> (+1.98%)`	⬆️
unittests1	`56.31% <55.69%> (+9.42%)`	⬆️
unittests2	`34.01% <2.53%> (+6.28%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java

pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/ArrayTest.java

pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java

Jackie-Jiang · 2025-01-22T06:36:48Z

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java



 /**
 * Base implementation of Map-based Table for indexed lookup
 */
 @SuppressWarnings({"rawtypes", "unchecked"})
 public abstract class IndexedTable extends BaseTable {
+  private static final int THREAD_POOL_SIZE = Math.max(Runtime.getRuntime().availableProcessors(), 1);


(minor) Some constants are available in ResourceManager

This name is also confusing. Seems this is the upper bound when _numThreadsForServerFinalReduce is not configured. Why not use the same upper bound?

True, reusing QueryMultiThreadingUtils.MAX_NUM_THREADS_PER_QUERY

Jackie-Jiang

LGTM with minor comments

pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java

Jackie-Jiang · 2025-01-23T01:22:12Z

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java

@@ -84,6 +94,10 @@ protected IndexedTable(DataSchema dataSchema, boolean hasFinalInput, QueryContex
    assert _hasOrderBy || (trimSize == Integer.MAX_VALUE && trimThreshold == Integer.MAX_VALUE);
    _trimSize = trimSize;
    _trimThreshold = trimThreshold;
+    // NOTE: The upper limit of threads number for final reduce is set to 2 * number of available processors by default
+    _numThreadsExtractFinalResult = Math.min(queryContext.getNumThreadsExtractFinalResult(),
+        Math.max(1, 2 * Runtime.getRuntime().availableProcessors()));


We should probably cap it at CPU cores because this is CPU heavy operation

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java

Jackie-Jiang · 2025-01-23T01:23:10Z

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java

+          for (int threadId = 0; threadId < numThreadsExtractFinalResult; threadId++) {
+            int startIdx = threadId * chunkSize;
+            int endIdx = Math.min(startIdx + chunkSize, topRecordsList.size());
+            if (startIdx < endIdx) {


Is this always true?

not always the case in the test with very small segment.

Jackie-Jiang · 2025-01-23T01:23:49Z

pinot-core/src/main/java/org/apache/pinot/core/plan/maker/InstancePlanMakerImplV2.java

+  public static final int DEFAULT_NUM_THREADS_FOR_FINAL_REDUCE = 1;
+  public static final int DEFAULT_PARALLEL_CHUNK_SIZE_FOR_FINAL_REDUCE = 10_000;


Rename them

…pache#14662) * Configure final reduce phase threads for heavy aggreagtion functions * Address comments * Add tests with numThreadsForFinalReduce

xiangfu0 requested a review from Jackie-Jiang December 16, 2024 05:13

xiangfu0 added enhancement Configuration Config changes (addition/deletion/change in behavior) query labels Dec 16, 2024

bziobrowski reviewed Dec 16, 2024

View reviewed changes

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java Show resolved Hide resolved

bziobrowski reviewed Dec 16, 2024

View reviewed changes

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java Show resolved Hide resolved

bziobrowski reviewed Dec 16, 2024

View reviewed changes

richardstartin reviewed Dec 17, 2024

View reviewed changes

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java Outdated Show resolved Hide resolved

xiangfu0 force-pushed the reduce-phase-multi-thread branch from 1f6e6b6 to 30d28c3 Compare January 1, 2025 15:10

xiangfu0 force-pushed the reduce-phase-multi-thread branch 3 times, most recently from 51be961 to 36dbcca Compare January 2, 2025 03:05

xiangfu0 requested review from bziobrowski and richardstartin January 4, 2025 15:12

bziobrowski reviewed Jan 7, 2025

View reviewed changes

pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java Show resolved Hide resolved

bziobrowski reviewed Jan 7, 2025

View reviewed changes

pinot-core/src/main/java/org/apache/pinot/core/data/table/IndexedTable.java Show resolved Hide resolved

xiangfu0 force-pushed the reduce-phase-multi-thread branch 4 times, most recently from f189992 to fcb643d Compare January 18, 2025 03:50

xiangfu0 requested a review from bziobrowski January 18, 2025 03:50

Jackie-Jiang reviewed Jan 18, 2025

View reviewed changes

xiangfu0 force-pushed the reduce-phase-multi-thread branch 4 times, most recently from fb0b6f8 to ceac8f5 Compare January 18, 2025 16:18

xiangfu0 requested a review from Jackie-Jiang January 18, 2025 16:35

apache deleted a comment from codecov-commenter Jan 18, 2025

Jackie-Jiang reviewed Jan 22, 2025

View reviewed changes

xiangfu0 force-pushed the reduce-phase-multi-thread branch from ceac8f5 to 4d67c0b Compare January 22, 2025 22:53

Jackie-Jiang approved these changes Jan 23, 2025

View reviewed changes

xiangfu0 added 3 commits January 23, 2025 10:06

Configure final reduce phase threads for heavy aggreagtion functions

e997aeb

Address comments

cd6f4f7

Add tests with numThreadsForFinalReduce

bcb01ca

xiangfu0 force-pushed the reduce-phase-multi-thread branch from 4d67c0b to bcb01ca Compare January 23, 2025 18:07

xiangfu0 merged commit 4967780 into apache:master Jan 23, 2025
21 checks passed

xiangfu0 deleted the reduce-phase-multi-thread branch January 23, 2025 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure final reduce phase threads for heavy aggreagtion functions #14662

Configure final reduce phase threads for heavy aggreagtion functions #14662

xiangfu0 commented Dec 16, 2024 •

edited

Loading

bziobrowski Dec 16, 2024 •

edited

Loading

richardstartin Dec 17, 2024

bziobrowski Dec 16, 2024

kishoreg commented Dec 17, 2024

xiangfu0 commented Jan 1, 2025

codecov-commenter commented Jan 18, 2025 •

edited

Loading

Jackie-Jiang Jan 22, 2025

Jackie-Jiang Jan 22, 2025

xiangfu0 Jan 22, 2025

Jackie-Jiang left a comment

Jackie-Jiang Jan 23, 2025

Jackie-Jiang Jan 23, 2025

xiangfu0 Jan 23, 2025

Jackie-Jiang Jan 23, 2025

xiangfu0 Jan 23, 2025

		@@ -232,6 +232,12 @@ public static Integer getGroupTrimThreshold(Map<String, String> queryOptions) {
		return uncheckedParseInt(QueryOptionKey.GROUP_TRIM_THRESHOLD, groupByTrimThreshold);
		}

		public static final int DEFAULT_NUM_THREADS_FOR_FINAL_REDUCE = 1;
		public static final int DEFAULT_PARALLEL_CHUNK_SIZE_FOR_FINAL_REDUCE = 10_000;

Configure final reduce phase threads for heavy aggreagtion functions #14662

Configure final reduce phase threads for heavy aggreagtion functions #14662

Conversation

xiangfu0 commented Dec 16, 2024 • edited Loading

bziobrowski Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kishoreg commented Dec 17, 2024

xiangfu0 commented Jan 1, 2025

codecov-commenter commented Jan 18, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiangfu0 commented Dec 16, 2024 •

edited

Loading

bziobrowski Dec 16, 2024 •

edited

Loading

codecov-commenter commented Jan 18, 2025 •

edited

Loading