Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure final reduce phase threads for heavy aggreagtion functions #14662

Merged
merged 3 commits into from
Jan 23, 2025

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Dec 16, 2024

Add a new query option: numThreadsForFinalReduce to allow customize the number of threads per aggregate/reduce call.

This will significantly reduce the execution time of aggregation groupby, where there are many groups and each group final reduce is very costly like funnel functions.

@xiangfu0 xiangfu0 added enhancement Configuration Config changes (addition/deletion/change in behavior) query labels Dec 16, 2024
Object[] values = record.getValues();
for (int i = 0; i < numAggregationFunctions; i++) {
int colId = i + _numKeyColumns;
values[colId] = _aggregationFunctions[i].extractFinalResult(values[colId]);
Copy link
Collaborator

@bziobrowski bziobrowski Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd make sense to either :

  • put an upper limit on _numThreadsForFinalReduce (e.g. 2 or 3* Runtime.getRuntime().availableProcessors()) or
  • change the variable to a boolean flag enableParallelFinalReduce and use a sensible number of task
    to prevent using excessive number of futures or various error modes, e.g.
    if _numThreadsForFinalReduce is Integer.MAX_VALUE then chunkSize is going to be negative.

If shared thread pool is overwhelmed by running tasks it might be good to use current thread not only to wait but also task processing, stealing tasks until there's nothing left and only then waiting for futures to finish.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If shared thread pool is overwhelmed by running tasks it might be good to use current thread not only to wait but also task processing, stealing tasks until there's nothing left and only then waiting for futures to finish.

Potentially, and this can be done transparently by configuring the executor's rejected execution handler to CallerRunsPolicy. However, beware if the executor, which does non-blocking work, is sized to the number of available processors, then if the thread pool is overwhelmed, it means the available CPUs are overwhelmed too. Performing reductions on the caller thread would only lead to excessive context switching and it might be better, from a global perspective, for the task to wait for capacity to be available.

@@ -232,6 +232,12 @@ public static Integer getGroupTrimThreshold(Map<String, String> queryOptions) {
return uncheckedParseInt(QueryOptionKey.GROUP_TRIM_THRESHOLD, groupByTrimThreshold);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to show that final reduce is parallelized in explain output ?

@kishoreg
Copy link
Member

can we do this automatically if the keys > X and for specific aggregation functions like funnel etc?

@xiangfu0 xiangfu0 force-pushed the reduce-phase-multi-thread branch from 1f6e6b6 to 30d28c3 Compare January 1, 2025 15:10
@xiangfu0
Copy link
Contributor Author

xiangfu0 commented Jan 1, 2025

can we do this automatically if the keys > X and for specific aggregation functions like funnel etc?

put some heuristic logic here.

@xiangfu0 xiangfu0 force-pushed the reduce-phase-multi-thread branch 3 times, most recently from 51be961 to 36dbcca Compare January 2, 2025 03:05
@xiangfu0 xiangfu0 force-pushed the reduce-phase-multi-thread branch 4 times, most recently from f189992 to fcb643d Compare January 18, 2025 03:50
@xiangfu0 xiangfu0 requested a review from bziobrowski January 18, 2025 03:50
@xiangfu0 xiangfu0 force-pushed the reduce-phase-multi-thread branch 4 times, most recently from fb0b6f8 to ceac8f5 Compare January 18, 2025 16:18
@xiangfu0 xiangfu0 requested a review from Jackie-Jiang January 18, 2025 16:35
@apache apache deleted a comment from codecov-commenter Jan 18, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jan 18, 2025

Codecov Report

Attention: Patch coverage is 55.69620% with 35 lines in your changes missing coverage. Please review.

Project coverage is 63.73%. Comparing base (59551e4) to head (bcb01ca).
Report is 1617 commits behind head on master.

Files with missing lines Patch % Lines
...org/apache/pinot/core/data/table/IndexedTable.java 38.00% 28 Missing and 3 partials ⚠️
...pinot/core/plan/maker/InstancePlanMakerImplV2.java 55.55% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14662      +/-   ##
============================================
+ Coverage     61.75%   63.73%   +1.98%     
- Complexity      207     1469    +1262     
============================================
  Files          2436     2708     +272     
  Lines        133233   151490   +18257     
  Branches      20636    23389    +2753     
============================================
+ Hits          82274    96551   +14277     
- Misses        44911    47683    +2772     
- Partials       6048     7256    +1208     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.70% <55.69%> (+1.99%) ⬆️
java-21 63.63% <55.69%> (+2.00%) ⬆️
skip-bytebuffers-false 63.71% <55.69%> (+1.96%) ⬆️
skip-bytebuffers-true 63.61% <55.69%> (+35.88%) ⬆️
temurin 63.73% <55.69%> (+1.98%) ⬆️
unittests 63.73% <55.69%> (+1.98%) ⬆️
unittests1 56.31% <55.69%> (+9.42%) ⬆️
unittests2 34.01% <2.53%> (+6.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.



/**
* Base implementation of Map-based Table for indexed lookup
*/
@SuppressWarnings({"rawtypes", "unchecked"})
public abstract class IndexedTable extends BaseTable {
private static final int THREAD_POOL_SIZE = Math.max(Runtime.getRuntime().availableProcessors(), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Some constants are available in ResourceManager

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is also confusing. Seems this is the upper bound when _numThreadsForServerFinalReduce is not configured. Why not use the same upper bound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, reusing QueryMultiThreadingUtils.MAX_NUM_THREADS_PER_QUERY

@xiangfu0 xiangfu0 force-pushed the reduce-phase-multi-thread branch from ceac8f5 to 4d67c0b Compare January 22, 2025 22:53
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

@@ -84,6 +94,10 @@ protected IndexedTable(DataSchema dataSchema, boolean hasFinalInput, QueryContex
assert _hasOrderBy || (trimSize == Integer.MAX_VALUE && trimThreshold == Integer.MAX_VALUE);
_trimSize = trimSize;
_trimThreshold = trimThreshold;
// NOTE: The upper limit of threads number for final reduce is set to 2 * number of available processors by default
_numThreadsExtractFinalResult = Math.min(queryContext.getNumThreadsExtractFinalResult(),
Math.max(1, 2 * Runtime.getRuntime().availableProcessors()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably cap it at CPU cores because this is CPU heavy operation

for (int threadId = 0; threadId < numThreadsExtractFinalResult; threadId++) {
int startIdx = threadId * chunkSize;
int endIdx = Math.min(startIdx + chunkSize, topRecordsList.size());
if (startIdx < endIdx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not always the case in the test with very small segment.

Comment on lines 98 to 99
public static final int DEFAULT_NUM_THREADS_FOR_FINAL_REDUCE = 1;
public static final int DEFAULT_PARALLEL_CHUNK_SIZE_FOR_FINAL_REDUCE = 10_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@xiangfu0 xiangfu0 force-pushed the reduce-phase-multi-thread branch from 4d67c0b to bcb01ca Compare January 23, 2025 18:07
@xiangfu0 xiangfu0 merged commit 4967780 into apache:master Jan 23, 2025
21 checks passed
@xiangfu0 xiangfu0 deleted the reduce-phase-multi-thread branch January 23, 2025 21:34
gortiz pushed a commit to gortiz/pinot that referenced this pull request Jan 30, 2025
…pache#14662)

* Configure final reduce phase threads for heavy aggreagtion functions

* Address comments

* Add tests with numThreadsForFinalReduce
zeronerdzerogeekzerocool pushed a commit to zeronerdzerogeekzerocool/pinot that referenced this pull request Feb 20, 2025
…pache#14662)

* Configure final reduce phase threads for heavy aggreagtion functions

* Address comments

* Add tests with numThreadsForFinalReduce
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) enhancement query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants