Get count from split metadata on simple time range query #5758

tontinton · 2025-04-19T17:05:17Z

I think I found a nice optimization here while starting to research the search query code. @rdettai

Count queries can return much faster by not downloading splits, and I couldn't think of a good reason to always download split files on time range queries, if the split time range is fully contained in the search request time range.

Split the PR into 2: #5759.

rdettai

Nice optimization. I think it deserves a unit test. E.g the following test scenario:

The mock metastore returns 4 splits (1 outside the range, 1 overlapping the star, 1 overlapping the end, 1 overlapping the whole range and 1 within the range.
The mock search service expects only 1 call.

rdettai · 2025-04-22T08:02:58Z

quickwit/quickwit-search/src/root.rs

+        let start_time = split.time_range.as_ref().map(|x| x.start()).copied();
+        let end_time = split.time_range.as_ref().map(|x| x.end()).copied();
+        if is_metadata_count_request(search_request, start_time, end_time) {


Nit, to make search_partial_hits_phase a bit more readable, I would pass the time_range directly to is_metadata_count_request

rdettai · 2025-04-22T08:09:58Z

quickwit/quickwit-search/src/root.rs

+                jobs_to_leaf_request(search_request, indexes_metas_for_leaf_search, client_jobs)?;
+            leaf_request_tasks.push(cluster_client.leaf_search(leaf_request, client.clone()));
+        }
+        leaf_search_responses.extend(try_join_all(leaf_request_tasks).await?);


Nit, moving it to a separate line would help see a bit more clearly how errors are resolved.

Suggested change

leaf_search_responses.extend(try_join_all(leaf_request_tasks).await?);

let executed_leaf_search_responses = try_join_all(leaf_request_tasks).await?;

leaf_search_responses.extend(executed_leaf_search_responses);

tontinton · 2025-04-22T20:40:48Z

Nice optimization. I think it deserves a unit test. E.g the following test scenario:

* The mock metastore returns 4 splits (1 outside the range, 1 overlapping the star, 1 overlapping the end, 1 overlapping the whole range and 1 within the range.

* The mock search service expects only 1 call.

Will add tests when I have a bit more time.

rdettai

Thanks for adding a test. Unfortunately I think it is flawed (so was my example of test earlier, I should have written "the mock search service expects to be called for 3 splits")

rdettai · 2025-04-30T07:51:23Z

quickwit/quickwit-search/src/root.rs

+        mock_search.expect_leaf_search().times(1).returning(|_req| {
+            Ok(quickwit_proto::search::LeafSearchResponse {
+                num_hits: 1,
+                partial_hits: vec![mock_partial_hit("split_inside", 1, 1)],
+                failed_splits: Vec::new(),
+                num_attempted_splits: 1,
+                ..Default::default()
+            })
+        });


asserting that _req contains the right splits is the important part of this test

the mock response should be all splits except "split_inside" which is the one we resolve at the root level 🙃

rdettai · 2025-04-30T07:53:30Z

quickwit/quickwit-search/src/root.rs

+        assert_eq!(resp.num_hits, 1);
+        assert_eq!(resp.hits.len(), 1);


This is not the response I would have expected here. num_hits should be 10 (num docs in spit inside) + whatever you decide leaf_search returns for the overlapping splits.

rdettai · 2025-04-30T07:56:12Z

quickwit/quickwit-search/src/root.rs

+            end_timestamp: Some(129_000),
+            index_id_patterns: vec!["test-index".to_string()],
+            query_ast: qast_json_helper("test", &["body"]),
+            max_hits: 10,


your change is not expected to run if max_hits!=0

tontinton · 2025-04-30T19:22:19Z

Thanks for adding a test. Unfortunately I think it is flawed (so was my example of test earlier, I should have written "the mock search service expects to be called for 3 splits")

You're totally right, I was supposed to be making a count query, fixed now.

quickwit/quickwit-search/src/root.rs

rdettai · 2025-05-14T14:54:19Z

quickwit/quickwit-search/src/root.rs

+    if let Some(request_start_timestamp) = request.start_timestamp {
+        let Some(split_start_timestamp) = split_start_timestamp else {
+            return false;
+        };
+        if split_start_timestamp < request_start_timestamp {
+            return false;
+        }
+    }


nit, I wonder whether this wouldn't be easier to read:

Suggested change

if let Some(request_start_timestamp) = request.start_timestamp {

let Some(split_start_timestamp) = split_start_timestamp else {

return false;

};

if split_start_timestamp < request_start_timestamp {

return false;

}

}

match (request.start_timestamp, split_start_timestamp) {

(Some(request_start), Some(split_start)) if split_start >= request_start => {}

(Some(_), _) => return false,

(None, _) => {}

}

rdettai · 2025-05-14T15:15:04Z

quickwit/quickwit-search/src/root.rs

+                    num_hits: req.leaf_requests[0]
+                        .split_offsets
+                        .iter()
+                        .map(|s| s.num_docs)
+                        .sum(),


We could make this test more powerful by setting a smaller number here.

rdettai · 2025-05-14T15:24:13Z

quickwit/quickwit-search/src/root.rs

+        assert_eq!(resp.num_hits, 50);
+        assert_eq!(resp.hits.len(), 0);


it is missing from the other tests, but would be nice to also assert resp.num_successful_splits here

rdettai · 2025-05-14T15:42:46Z

last minute thought, you could also add some tests in https://github.com/tontinton/quickwit/blob/926a2f33d7b35a5cee064adc457c4503f96dc725/quickwit/rest-api-tests/scenarii/qw_search_api/0001_ts_range.yaml to double check the limit conditions, e.g:

endpoint: simple/search
params:
  query: "*"
  start_timestamp: 1684993000
  end_timestamp: 1684993004
expected:
  num_hits: 3

should confirm that the upper bound exclusion condition is correct and stays that way.

* Remove `quickwit-lambda` package * Fix warning

Bumps the github-actions group with 2 updates: [actions/github-script](https://github.com/actions/github-script) and [actions/setup-node](https://github.com/actions/setup-node). Updates `actions/github-script` from 7 to 8 - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@v7...v8) Updates `actions/setup-node` from 4 to 5 - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](actions/setup-node@v4...v5) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '8' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/setup-node dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…oss#5890) The following methods are used in order: - from the `QW_NUM_CPUS` environment variable - from the `KUBERNETES_LIMITS_CPU` environment variable - from the operating system - default to 2. Co-authored-by: fulmicoton <[email protected]>

Follow-up for quickwit-oss#5884

tontinton mentioned this pull request Apr 19, 2025

Optimize simple time ranged search queries #5759

Open

tontinton changed the title ~~Get count from split metadata on simple query with timerange~~ Get count from split metadata on simple query with time range Apr 19, 2025

tontinton changed the title ~~Get count from split metadata on simple query with time range~~ Get count from split metadata on simple time range query Apr 19, 2025

rdettai reviewed Apr 22, 2025

View reviewed changes

tontinton force-pushed the optimize-timestamp-range-count branch from de8dde6 to f3ab102 Compare April 22, 2025 20:40

tontinton requested a review from rdettai April 29, 2025 17:38

rdettai requested changes Apr 30, 2025

View reviewed changes

tontinton force-pushed the optimize-timestamp-range-count branch from c535af6 to 89f26f1 Compare April 30, 2025 19:21

tontinton requested a review from rdettai April 30, 2025 19:22

tontinton commented Apr 30, 2025

View reviewed changes

quickwit/quickwit-search/src/root.rs Show resolved Hide resolved

tontinton force-pushed the optimize-timestamp-range-count branch from 89f26f1 to 926a2f3 Compare April 30, 2025 20:48

rdettai approved these changes May 14, 2025

View reviewed changes

tontinton mentioned this pull request May 14, 2025

Remove timerange root search #5760

Open

tontinton force-pushed the optimize-timestamp-range-count branch from 926a2f3 to 2a6a590 Compare May 14, 2025 18:49

tontinton requested a review from rdettai May 14, 2025 18:49

tontinton force-pushed the optimize-timestamp-range-count branch 4 times, most recently from 4f1812e to 8c4d360 Compare May 15, 2025 12:09

trinity-1686a and others added 8 commits September 8, 2025 17:01

don't ignore indefinitely failing seeds

1941bfa

fix tests

174e5e7

Remove quickwit-lambda package (quickwit-oss#5884)

340a365

* Remove `quickwit-lambda` package * Fix warning

Remove search stream endpoint (quickwit-oss#5886)

4a1d9cd

Upgrade Warp (quickwit-oss#5870)

344a19c

Add object storage metrics for gcs (quickwit-oss#5889)

4fb95a3

add !include functionality to integration tests (quickwit-oss#5891)

5c9cd01

fulmicoton and others added 5 commits September 12, 2025 16:25

fix coverage (quickwit-oss#5893)

ef6b319

Clean up remaining AWS Lambda references

d41540d

Follow-up for quickwit-oss#5884

Get count from split metadata on simple query with timerange

b82d084

Add tests to search splits contained in time range

1ea3723

tontinton force-pushed the optimize-timestamp-range-count branch from 8c4d360 to 1ea3723 Compare September 13, 2025 12:51

fulmicoton-dd force-pushed the main branch from af0b2dc to 73205fc Compare September 23, 2025 13:16

	leaf_search_responses.extend(try_join_all(leaf_request_tasks).await?);
	let executed_leaf_search_responses = try_join_all(leaf_request_tasks).await?;
	leaf_search_responses.extend(executed_leaf_search_responses);

		assert_eq!(resp.num_hits, 1);
		assert_eq!(resp.hits.len(), 1);

		assert_eq!(resp.num_hits, 50);
		assert_eq!(resp.hits.len(), 0);

Get count from split metadata on simple time range query #5758

Are you sure you want to change the base?

Get count from split metadata on simple time range query #5758

Uh oh!

Conversation

tontinton commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdettai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tontinton commented Apr 22, 2025

Uh oh!

rdettai left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tontinton commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdettai commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

tontinton commented Apr 19, 2025 •

edited

Loading

rdettai left a comment •

edited

Loading

tontinton commented Apr 30, 2025 •

edited

Loading

rdettai commented May 14, 2025 •

edited

Loading