Skip to content

Conversation

mkhludnev
Copy link
Contributor

Continuation of #19404.

Copy link
Contributor

❌ Gradle check result for f95c056: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mkhludnev mkhludnev force-pushed the stream-agg-assert-single-list branch from f95c056 to 9b1eb64 Compare September 25, 2025 18:05
Copy link
Contributor

✅ Gradle check result for 9b1eb64: SUCCESS

Copy link

codecov bot commented Sep 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.08%. Comparing base (ac6dfa1) to head (2771ebc).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19416      +/-   ##
============================================
- Coverage     73.09%   73.08%   -0.02%     
+ Complexity    70553    70496      -57     
============================================
  Files          5716     5716              
  Lines        322926   322935       +9     
  Branches      46770    46770              
============================================
- Hits         236032   236004      -28     
+ Misses        67882    67873       -9     
- Partials      19012    19058      +46     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Dropping two RandomIndexWriters still cause flakiness ;( (before).

Signed-off-by: Mikhail Khludnev <[email protected]>
@mkhludnev
Copy link
Contributor Author

If the single segment invariant have place, I suppose it's ready for merge. 🙏🏼

Copy link
Contributor

❕ Gradle check result for 10130e0: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@andrross
Copy link
Member

@bowenlan-amzn Can you take a look?

@rishabhmaurya
Copy link
Contributor

If the single segment invariant have place, I suppose it's ready for merge. 🙏🏼

Thank you again @mkhludnev for taking care of it as a follow up. I don't want to call it invariant, but at least for now, per segment flushing is the right assumption, but might break in future once we start supporting other modes i.e. intra segment flush or multi-segment flush. I think then we can change this condition and the tests accordingly. So for now, this check sounds very reasonable. I will wait for @bowenlan-amzn to share his opinion.

@bowenlan-amzn
Copy link
Member

Spent some time understanding the context here, and this change makes sense to me.
It's a assertion saying each leaf (partition) collection needs to end with a reset before processing the next one.
So the unit test in StreamStringTermsAggregatorTests can only handle one segment case, because it uses IndexSearcher which doesn't contain the logic of ContextIndexSearcher to trigger reset.

if (searchContext.isStreamSearch()) {
logger.debug(
"Stream intermediate aggregation for segment [{}], shard [{}]",
ctx.ord,
searchContext.shardTarget().getShardId().id()
);
List<InternalAggregation> internalAggregation = searchContext.bucketCollectorProcessor().buildAggBatch(collector);
if (!internalAggregation.isEmpty()) {
sendBatch(internalAggregation);
}
}

@mkhludnev hope I'm understanding this correctly

@rishabhmaurya @harshavamsi I think this is testing problem but need to be careful in the following PRs so to not introduce flaky tests. If we can mock the ContextIndexSearcher for streaming aggregation unit test, that would solve this problem even better.

Copy link
Contributor

github-actions bot commented Oct 2, 2025

✅ Gradle check result for afdbdaf: SUCCESS

@mkhludnev
Copy link
Contributor Author

Hi @bowenlan-amzn
Here we go https://github.com/opensearch-project/OpenSearch/pull/19416/files#diff-f58e89ce59dd4987a50355d51c4201b744d38d58a204aeb4da3dbbef16d5d525R363
This test grabs per segment aggs and then reduces them to the total aggregation.

Copy link
Contributor

github-actions bot commented Oct 3, 2025

❌ Gradle check result for fab4553: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Mikhail Khludnev <[email protected]>
Copy link
Contributor

github-actions bot commented Oct 4, 2025

❌ Gradle check result for 6347a4a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mkhludnev
Copy link
Contributor Author

mkhludnev commented Oct 4, 2025

aghh... adding assert into StreamSTAgg.getLeafCollector() reveals the second invocation by ProfilingAggregator. It causes the failure in SubAggregationIT/testStreamingAggregationUsed. It needs to be covered with this unit test.

at org.opensearch.search.aggregations.bucket.terms.StreamStringTermsAggregator.getLeafCollector(StreamStringTermsAggregator.java:94)
	at org.opensearch.search.aggregations.AggregatorBase.getLeafCollector(AggregatorBase.java:210)
	at org.opensearch.search.profile.aggregation.ProfilingAggregator.getLeafCollector(ProfilingAggregator.java:126)
	at org.opensearch.search.profile.aggregation.ProfilingAggregator.getLeafCollector(ProfilingAggregator.java:53)
	at org.apache.lucene.search.FilterCollector.getLeafCollector(FilterCollector.java:38)
	at org.opensearch.search.profile.query.ProfileCollector.getLeafCollector(ProfileCollector.java:85)
	at org.opensearch.search.profile.query.InternalProfileCollector.getLeafCollector(InternalProfileCollector.java:147)
	at org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:130)
	at org.apache.lucene.search.FilterCollector.getLeafCollector(FilterCollector.java:38)
	at org.opensearch.search.profile.query.ProfileCollector.getLeafCollector(ProfileCollector.java:85)
	at org.opensearch.search.profile.query.InternalProfileCollector.getLeafCollector(InternalProfileCollector.java:147)
	at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:351)
	at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:316)
	at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:280)
	at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:356)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:493)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:450)
	at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:433)
	at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:60)
	at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:283)
	at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:156)
	at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:733)

@mkhludnev mkhludnev marked this pull request as draft October 4, 2025 17:27
@mkhludnev mkhludnev marked this pull request as ready for review October 5, 2025 20:30
Copy link
Contributor

github-actions bot commented Oct 5, 2025

❕ Gradle check result for 2771ebc: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@mkhludnev mkhludnev force-pushed the stream-agg-assert-single-list branch from 2771ebc to 0f84b06 Compare October 6, 2025 05:06
Signed-off-by: m.khludnev <[email protected]>
Copy link
Contributor

github-actions bot commented Oct 6, 2025

❌ Gradle check result for 3c0350f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants