Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Star Tree Request/Response structure #227

Closed
wants to merge 3 commits into from

Conversation

sandeshkr419
Copy link
Owner

@sandeshkr419 sandeshkr419 commented Aug 11, 2024

Description

For an index supporting star-tree composite index, this changes tries to achieve resolving a metric aggregation with/without a numeric terms query with the help of star-tree.

This PR in present state primarily focuses on flow of information request to response. Therefore, the code pieces around calculating the correct response values are still inaccurate and are in WIP.

This PR depends on opensearch-project#14809, therefore the depending unmerged changes have been utilized for now in my private fork to discuss the changes in parallel.

Approach

A new StarTreeQuery is introduced which helps resolve to star-tree documents. This star-tree query is formed (if it can be) at the shard level, this is not done at coordinator level to avoid node to node transportation. Also, all the information is present at shard level and OpenSearch does majority of query rewrite at shard level itself. This star tree query is encapsulated in an OriginalOrStarTreeQuery which helps preserve the original query alongwith the new star tree query. This encapsulation is done so as to preserve both the queries and decision whether to use which query can be taken at a segment level.

Example query shape:

Request:

{
    "query": {
        "term": {
            "status": 200
        }
    },
    "size": 0,
    "aggs": {
                        "sum_status": {
                            "sum": {
                                "field": "size"
                            }
                        }
                    }
}

Response:

{
    "took": 4038,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 42,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "sum_status": {
            "value": 24745.0
        }
    }
}

Star Tree Flow Response:

{
    "took": 21120,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "sum_status": {
            "value": 24745.0
        }
    }
}

Approach:

  1. The query shape is identified at the shard level (SearchService.java) and the query/aggregation (if can be resolved by star-tree) is parsed to a star-tree query.
  2. The star-tree query is wrapped around OriginalOrStarTreeQuery to preserve the original query - this is because the decision to decide which implementation (default/startree) to use can be taken for a segment level.
  3. If star-tree can be utilized to answer the query, the star-tree document set is then collected by the relevant aggregator/collector. In this POC, I have made changes to SumAggregator to demonstrate the flow of changes.

Related Issues

opensearch-project#15257

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

// Check if the query builder is an instance of TermQueryBuilder
if (queryBuilder instanceof TermQueryBuilder) {
TermQueryBuilder tq = (TermQueryBuilder) queryBuilder;
String field = tq.fieldName();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be converted to dimension field name of star tree?

long inputQueryVal = Long.parseLong(tq.value().toString());

// Get or create the list of predicates for the given field
List<Predicate<Long>> predicates = predicateMap.getOrDefault(field, new ArrayList<>());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use predicates, we can't use binary search during star tree traversal.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't really put much thought into this yet, let me revisit the part on how to better store information to use filters. For request/response parsing I had just inspired changes from previous POCs.

@Override
public void collect(int doc, long bucket) throws IOException {
// TODO: Fix the response for collecting star tree sum
sums = bigArrays.grow(sums, bucket + 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract out the default implementation getDefaultLeafCollector and reuse the same logic.

I really like the approach of reusing the existing aggregators.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll try and do that, I am inclined to on refactoring & re-using same implementations wherever possible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final SortedNumericDoubleValues values = valuesSource.doubleValues(ctx);

we can check if we are able to get sortedNumericDoubleValues , otherwise we need to convert to double for each doc


protected StarTreeValues getStarTreeValues(LeafReaderContext ctx, CompositeIndexFieldInfo starTree) throws IOException {
SegmentReader reader = Lucene.segmentReader(ctx.reader());
if (!(reader.getDocValuesReader() instanceof CompositeIndexReader)) return null;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to see if its better to load them as doubleValuesSource similar to how existing fields are loaded. And that too load the specific fields requested instead of loading the entire star tree values. ( for example in sum aggregator, we can fetch the doubleFieldData of a particular field of star tree metric , for eg : sum_status_metric can be loaded in )

Then you don't need to worry about conversion either.

// Can be marked false for majority cases for which star-tree cannot be used
// Will save checking the criteria later and we can have a limit on what search requests are supported
// As we increment the cases where star-tree can be used, this can be set back to true
boolean canUseStarTree = context.mapperService().isCompositeIndexPresent();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be based on dimensions and metric rather ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just initialization, will toggle this off for multiple cases. (TODO-in this PR iteself)

@sandeshkr419 sandeshkr419 changed the title doc values file format Star Tree Request/Response structure Aug 13, 2024
if (queryBuilder instanceof TermQueryBuilder) {
TermQueryBuilder tq = (TermQueryBuilder) queryBuilder;
String field = tq.fieldName();
long inputQueryVal = Long.parseLong(tq.value().toString());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert to sortable long

Signed-off-by: Sandesh Kumar <[email protected]>
kahanSummation.reset(sum, compensation);

for (int i = 0; i < valuesCount; i++) {
double value = Double.longBitsToDouble(dv.nextValue());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we do this ?

 public static Double sortableLongtoDouble(Long value) {
        return NumericUtils.sortableLongToDouble(value);
    }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also lets see if we can get double sorted numeric dv ahead

@Override
        public SortedNumericDoubleValues getDoubleValues() {
            try {
                SortedNumericDocValues raw = DocValues.getSortedNumeric(reader, field);
                return FieldData.sortableLongBitsToDoubles(raw);
            } catch (IOException e) {
                throw new IllegalStateException("Cannot load doc values", e);
            }
        }

@sandeshkr419
Copy link
Owner Author

Closing this PR in lieu of opensearch-project#15289

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants