-
-
Notifications
You must be signed in to change notification settings - Fork 809
Description
Problem Definition
We need filter aggregations for our analytics queries - stuff like "get me the average price overall, plus the average price for just t-shirts, plus the count of electronics items" in a single query.
Right now there's no way to do this in Tantivy. We tried using MultiCollector
but it doesn't work - all collectors get the same documents, so you can't filter differently for each one. The collectors get confused when you try to filter at that level.
So we're stuck running separate queries for each filter, which is slow and defeats the purpose.
For example, this common Elasticsearch query pattern is not supported:
{
"aggs": {
"avg_price": { "avg": { "field": "price" } },
"t_shirts": {
"filter": { "term": { "type": "t-shirt" } },
"aggs": {
"avg_price": { "avg": { "field": "price" } }
}
}
}
}
Proposed Solution
Add filter aggregation support to Tantivy. Basically implement the same thing ES has - a bucket aggregation that creates a single bucket with documents matching a query, and you can nest other aggregations inside it.
Would be great if it:
- Uses the same JSON format as Elasticsearch (for compatibility)
- Works with sub-aggregations like the other bucket types
- Handles basic queries like term, range, bool
- Returns the same JSON structure as Elasticsearch
This would provide Elasticsearch compatibility for filter aggregations.
Alternatives Considered
We looked at a few other approaches:
-
Custom multi-collector: Build something that routes docs to different collectors based on filters. Seems overly complex and would need big changes to the collector system.
-
Add filters to existing aggregations: Like making every aggregation accept an optional filter param. Would work but you'd have to modify every single aggregation type.
-
Just live with separate queries: What we're doing now. Works but it's slow and feels hacky.
The filter aggregation approach seems cleanest since it fits into the existing bucket aggregation pattern and gives us ES compatibility.
Related Issues:
- Aggregation Feature Parity with Elasticsearch #1690 - Multi-collector support for filtered aggregations