Skip to content

Add Elasticsearch Filter Aggregation Support #2706

@mdashti

Description

@mdashti

Problem Definition

We need filter aggregations for our analytics queries - stuff like "get me the average price overall, plus the average price for just t-shirts, plus the count of electronics items" in a single query.

Right now there's no way to do this in Tantivy. We tried using MultiCollector but it doesn't work - all collectors get the same documents, so you can't filter differently for each one. The collectors get confused when you try to filter at that level.

So we're stuck running separate queries for each filter, which is slow and defeats the purpose.

For example, this common Elasticsearch query pattern is not supported:

{
  "aggs": {
    "avg_price": { "avg": { "field": "price" } },
    "t_shirts": {
      "filter": { "term": { "type": "t-shirt" } },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } }
      }
    }
  }
}

Proposed Solution

Add filter aggregation support to Tantivy. Basically implement the same thing ES has - a bucket aggregation that creates a single bucket with documents matching a query, and you can nest other aggregations inside it.

Would be great if it:

  • Uses the same JSON format as Elasticsearch (for compatibility)
  • Works with sub-aggregations like the other bucket types
  • Handles basic queries like term, range, bool
  • Returns the same JSON structure as Elasticsearch

This would provide Elasticsearch compatibility for filter aggregations.

Alternatives Considered

We looked at a few other approaches:

  1. Custom multi-collector: Build something that routes docs to different collectors based on filters. Seems overly complex and would need big changes to the collector system.

  2. Add filters to existing aggregations: Like making every aggregation accept an optional filter param. Would work but you'd have to modify every single aggregation type.

  3. Just live with separate queries: What we're doing now. Works but it's slow and feels hacky.

The filter aggregation approach seems cleanest since it fits into the existing bucket aggregation pattern and gives us ES compatibility.

Related Issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions