Latency improvements to Multi Term Aggregations #14993

expani · 2024-07-29T11:42:09Z

Description

This PR aims to introduce the following improvements :

Reduces the latency of Multi Term Aggregation queries by 7-10 seconds
Reduces the memory footprint of multi term aggregation queries by decreasing its allocations.

Testing was done on a c5.9xlarge with 20GB of JVM heap and store type as mmapfs to ensure any affect of EBS Latencies doesn't affect the result much. The same was verified using lsof on the OS pid to ensure all index files are m-mapped ( mem is the type in such cases )

The numbers below are averaged after 20 iterations of each type of query with and w/o changes.

Workload	Field1	Field2	WithoutChanges	WithChanges
big5	agent_name	host_name	236 secs	226 secs
big5	process_name	agent_id	45 secs	38 secs
nyc_taxi	store_and_fwd_flag	payment_type	53 secs	44 secs

Sample Aggregation Query

curl -k -H 'Content-Type: application/json' https://localhost:9200/nyc_taxis/_search -u 'admin:xxx' -d '{
  "aggs": {
    "flag_and_payment_type": {
      "multi_terms": {
        "terms": [{
          "field": "store_and_fwd_flag"
        }, {
          "field": "payment_type"
        }]
      }
    }
  }
}'

Multi term aggregation goes through all the docs given by the collector of the filter query ( MatchAllDocs Query if no filter is present )

For every document given by the collector, it generates cartesian product of all the values for all the fields present in the aggregation here

A deep copy is generated for every composite key here which is eventually copied again ( only for the first time ) while adding to the bucket. This PR refactors the code to remove the need for a deep copy of every composite key.

We also perform a deep copy of the field values retrieved by Lucene here This is only essential for fields with multiple values in a document and can be avoided for fields with single value in a document.

Allocation Profiling

For Big5 Benchmark Process Name and Agent Id ( LOW CARDINALITY )

Deep Copy of composite key and for single valued fields takes around 25% of the overall allocations for a multi term aggregation query.

Collecting all the composite keys for every document in a list here also takes around 9% of the overall allocations.

For Big5 Benchmark Agent Name and Host Name ( HIGH CARDINALITY )

19% of overall allocations spent in deep copy of composite key

Collecting all composite keys taking around 9% same as before.

Also, for each loop to go over the field values for a document here contributes to 17% of overall allocations because of creating a new Iterator every time. Changed the same to use a regular for loop.

Testing

I ensured that results of the output were same with and without my changes for aggregation queries for different fields of the Big5 dataset.
Testing was done for concurrent search using different field combinations and the results were the same.

Will see the existing integs and UTs for Multi term aggregations to ensure if any corner cases are not covered.

github-actions · 2024-07-29T12:30:50Z

❌ Gradle check result for fb84412: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

bowenlan-amzn

Looks good! Like how you document the profiling results to explain this!

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java

sandeshkr419 · 2024-07-31T18:59:26Z

Thanks @expani for the code changes and detailed explanation on PR - I do have some minor comments on refactoring mainly. Please add relevant comments/javadocs to help future developers understand minor optimization and utilities for various low level operations.

Did we check performance on any data which has multi-valued fields as well? If not, let us propose a change in OSB for multi-valued fields in some workloads, in case we don't have any such available workloads.

Also, let us iterate CI to green and check if we have a solid code coverage as well.

expani · 2024-07-31T20:28:15Z

Thanks for taking the time to review @bowenlan-amzn and @sandeshkr419

I will add required java docs and comments as suggested.
I have explained the reasoning for code structure, let me know if you feel otherwise.

Did we check performance on any data which has multi-valued fields as well? If not, let us propose a change in OSB for multi-valued fields in some workloads, in case we don't have any such available workloads.

Checked with few multi-values fields only for testing and not from a performance perspective.
Will check with OSB team for any such existing workloads as I need it for other possible optimisations as well.

Current CI seems to be failing due to 2 tests that have been reported to be flaky by multiple other folks.

> Task :server:internalClusterTest

Tests with failures:
 - org.opensearch.action.admin.indices.create.RemoteSplitIndexIT.classMethod
 - org.opensearch.remotestore.RemoteStoreStatsIT.testDownloadStatsCorrectnessSinglePrimarySingleReplica

Will check on the coverage of the existing tests and add any if required.

expani · 2024-07-31T21:57:28Z

@sandeshkr419 @bowenlan-amzn Made changes based on the previous comments.

Verified the existing UTs are covering all the branches and code paths added with this PR.

Please have a look, thanks.

github-actions · 2024-10-06T11:49:38Z

❌ Gradle check result for 31bbe46: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-10-07T07:38:52Z

❌ Gradle check result for 20f495d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: expani <[email protected]>

github-actions · 2024-10-07T11:39:30Z

❌ Gradle check result for 734e927: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-10-07T12:41:40Z

❕ Gradle check result for 734e927: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

expani · 2024-10-07T13:22:18Z

@msfroh I had added some tests and replied to your comments.

Build is finally passing after multiple flaky tests. Please have a look.

opensearch-trigger-bot · 2024-10-07T19:02:44Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-14993-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e885aa93279342c9ab219ecf612d887ff8de8af6
# Push it to GitHub
git push --set-upstream origin backport/backport-14993-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-14993-to-2.x.

expani requested review from anasalkouz, andrross, ashking94, Bukhtawar, CEHENKLE, dblock, dbwiddis, gbbafna, kotwanikunal, mch2, msfroh, nknize, owaiskazi19, reta, Rishikesh1159, sachinpkale, saratvemulapalli, shwetathareja, sohami and VachaShah as code owners July 29, 2024 11:42

bowenlan-amzn added v2.17.0 backport 2.x Backport to 2.x branch labels Jul 31, 2024

bowenlan-amzn approved these changes Jul 31, 2024

View reviewed changes

sandeshkr419 reviewed Jul 31, 2024

View reviewed changes

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java Outdated Show resolved Hide resolved

sandeshkr419 reviewed Jul 31, 2024

View reviewed changes

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java Outdated Show resolved Hide resolved

sandeshkr419 reviewed Jul 31, 2024

View reviewed changes

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java Outdated Show resolved Hide resolved

expani closed this Oct 7, 2024

expani reopened this Oct 7, 2024

expani added 8 commits October 7, 2024 14:54

Avoid deep copy and other allocation improvements

f9fdce6

Signed-off-by: expani <[email protected]>

Refactoring based on PR Comments and added JavaDocs

5324fe4

Signed-off-by: expani <[email protected]>

Added more comments

f2a6adb

Signed-off-by: expani <[email protected]>

Added character for Triggering Jenkins build

93e355e

Signed-off-by: expani <[email protected]>

Changes to cover collectZeroDocEntries method

313f2d4

Signed-off-by: expani <[email protected]>

Updated comment based on change in method's functionality

0ed2ac8

Signed-off-by: expani <[email protected]>

Added test to cover branches in collectZeroDocEntriesIfRequired

23ff522

Signed-off-by: expani <[email protected]>

Rebased and resolved changelog conflict

734e927

Signed-off-by: expani <[email protected]>

expani force-pushed the main branch from 20f495d to 734e927 Compare October 7, 2024 09:28

expani closed this Oct 7, 2024

expani reopened this Oct 7, 2024

expani closed this Oct 7, 2024

expani reopened this Oct 7, 2024

msfroh approved these changes Oct 7, 2024

View reviewed changes

msfroh merged commit e885aa9 into opensearch-project:main Oct 7, 2024
40 of 43 checks passed

opensearch-trigger-bot bot added the backport-failed label Oct 7, 2024

opensearch-ci-bot mentioned this pull request Oct 8, 2024

[AUTOCUT] Gradle Check Flaky Test Report for RemoteStoreIT #16145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency improvements to Multi Term Aggregations #14993

Latency improvements to Multi Term Aggregations #14993

expani commented Jul 29, 2024 •

edited

Loading

github-actions bot commented Jul 29, 2024

bowenlan-amzn left a comment

sandeshkr419 commented Jul 31, 2024 •

edited

Loading

expani commented Jul 31, 2024

expani commented Jul 31, 2024

github-actions bot commented Oct 6, 2024

github-actions bot commented Oct 7, 2024

github-actions bot commented Oct 7, 2024

github-actions bot commented Oct 7, 2024

expani commented Oct 7, 2024 •

edited

Loading

opensearch-trigger-bot bot commented Oct 7, 2024

Latency improvements to Multi Term Aggregations #14993

Latency improvements to Multi Term Aggregations #14993

Conversation

expani commented Jul 29, 2024 • edited Loading

Description

Allocation Profiling

Testing

github-actions bot commented Jul 29, 2024

bowenlan-amzn left a comment

Choose a reason for hiding this comment

sandeshkr419 commented Jul 31, 2024 • edited Loading

expani commented Jul 31, 2024

expani commented Jul 31, 2024

github-actions bot commented Oct 6, 2024

github-actions bot commented Oct 7, 2024

github-actions bot commented Oct 7, 2024

github-actions bot commented Oct 7, 2024

expani commented Oct 7, 2024 • edited Loading

opensearch-trigger-bot bot commented Oct 7, 2024

expani commented Jul 29, 2024 •

edited

Loading

sandeshkr419 commented Jul 31, 2024 •

edited

Loading

expani commented Oct 7, 2024 •

edited

Loading