Skip to content

perf: optimize EventCounter SQL #2102

@nnnkkk7

Description

@nnnkkk7

Describe the feature you'd like

It takes longer to execute GetExperimentGoalCount method and GetExperimentEvaluationCount method sometimes.

GetExperimentGoalCount (pkg/eventcounter/storage/v2/sql/goal_count.sql)

Root Causes:

  • Redundant DISTINCT operation: The CTE grouped_by_user_evaluation already groups by (user_id, variation_id), making each row unique. The outer query's COUNT(DISTINCT user_id) is recalculating uniqueness unnecessarily - should use COUNT(*) instead
  • Unnecessary ORDER BY clause: The query contains ORDER BY variation_id which forces BigQuery to sort results before returning, but the application code already handles ordering based on req.VariationIds - this ORDER BY should be removed
  • Missing variation_id filter: The query aggregates ALL variations then filters in application code. Should add WHERE variation_id IN UNNEST(@variationIDs) to reduce data scanned
  • BigQuery shuffle overhead: The combination of DISTINCT + ORDER BY forces expensive data redistribution across nodes - removing both will eliminate shuffle operations

GetExperimentEvaluationCount (pkg/eventcounter/storage/v2/sql/evaluation_count.sql)

Root Causes:

  • Direct COUNT(DISTINCT) on raw data: The query runs COUNT(DISTINCT user_id) directly on the full dataset without pre-aggregation. Should use CTE with GROUP BY user_id, variation_id first, then COUNT(*) on grouped results
  • No staged aggregation: Single-pass query computes both COUNT(DISTINCT user_id) and COUNT(id) simultaneously on raw data. Should separate into CTE for pre-grouping, then aggregate the grouped data
  • Missing variation_id filter: Like goal_count.sql, processes all variations without filtering. Should add WHERE variation_id IN UNNEST(@variationIDs) to reduce scan scope
  • Inconsistent pattern with goal_count.sql: Uses different aggregation approach than goal_count.sql. Should align to use same CTE-based pattern for consistency and performance

Background

No response

Alternative solutions

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions