Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Star Tree] [Search] Support for metric aggregations with/without term query #15289
base: main
Are you sure you want to change the base?
[Star Tree] [Search] Support for metric aggregations with/without term query #15289
Changes from all commits
19e2811
62f4920
52d3450
cf2ec5b
416570f
67b5bb3
0c3a938
e41ea0e
ca6407c
f7caeb1
c91ae0a
bb3c510
24ed013
193dd89
1900450
bc80495
e2060ae
01d1b8f
a051bce
c227b74
2d10bdd
6fb4794
2d19d8a
e86ab94
e76aa53
3887a62
ae4393f
326400a
89c845d
c7b70b0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: Would it make sense to push this
if
down intogetStarTreePredicates
. I guess we can move it whenever we add support for more query types. That said, I think we should be a little opinionated about where the query-matching logic will live.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to avoid duplicate checks actually. From
getStarTreePredicates
it would be difficult to classify whether the query shape is something that is supported or not and that is not feasible from just thequeryMap
returned fromgetStarTreePredicates
- I mean we could do a distinction between null and empty-map but that might not be very readable - so delegating responsibility of getting queryMap as a separate utility.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
tq.value()
isn't a number, this will throw an exception, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails at a very early step itself when the original (default) query is constructed.
Validated with getting same response with star tree settings enabled & disabled both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a check to make sure we're querying a numeric field? If the field has a
keyword
type, then a string will parse successfully.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you kind of have a check on the indexing side of things, since we currently only support dimensions defined over numeric fields, I think. So, the field wouldn't be present in
supportedDimensions
if it's not numeric. Eventually, we will want to support non-string values, though, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now numeric fields are only supported. Once keyword fields are introduced, will have to take care of it then I guess.
@bharath-techie - how are keyword fields ingested - their values will be ingested be converting to some numeric fields only, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey keyword support will be available quite soon as code changes are complete and tests are being added.
No the user input will be a string, first you need to convert to bytesRef, then you need to look up the ord for the bytesRef in the associated dimension starTreeDocValues field. [ SortedSetDocValues field for keyword fields compared to SortedNumericDocValues of numeric fields ]
Something like below:
Poc code reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method might get called from multiple threads if we use concurrent segment search. I suspect that you would get a
ConcurrentModificationException
with enough segments.You could avoid that by initializing an array of
FixedBitSet
(of sizecontext.searcher().getLeafContexts().size()
and usingctx.ord
as the key.Since Lucene 10 introduces intra-segment concurrency, I think we'll eventually need a way to guarantee that only one thread per segment initializes this value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the value cache map to startree context. Defining it as
volatile
should ensure thread visibility and initializing it with ConcurrentHashMap to avoidConcurrentModificationException
.When upgrading to Lucene 10, I think that will warrant some work to resolve intra-segment concurrency. I think that calls out for a broader discussion in general. Most likely, only thread per [group of threads operating on same shard] will have to compute the aggregation result while the remaining will just do nothing in this case since the star-tree index presently won't be able to split as the default index in its current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we should be able to avoid any atomics (including
ConcurrentHashMap
). That's why I suggested preallocating an array whose size corresponds to the number of segments. If only one thread operates on each segment (which is likely also what we would want once we have intra-segment concurrency), then you can safely let each thread overwrite anull
value in the array without worrying aboutConcurrentModificationException
, since no two threads will try touching the same value.