Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bounded unique count aggregation #781

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jbrooks-stripe
Copy link
Collaborator

@jbrooks-stripe jbrooks-stripe commented Jul 2, 2024

Summary

Adds a BOUNDED_UNIQUE_COUNT aggregation. This will allow exact unique/distinct counts, but will cap at a given value to keep memory usage constant.

Why / Goal

We have use cases where we'd prefer an exact solution instead of the approx equivalents, but want to have protections in place so that memory doesn't become an issue.

Test Plan

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested

Checklist

  • Documentation update

Reviewers

@jbrooks-stripe jbrooks-stripe force-pushed the jbrooks-bounded-unique-count branch 5 times, most recently from b74e8ff to 685e0b9 Compare July 2, 2024 21:41
@jbrooks-stripe jbrooks-stripe marked this pull request as ready for review July 3, 2024 19:31
Copy link
Collaborator

@pengyu-hou pengyu-hou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrooks-stripe the change looks good. Could you please also update the operation in the groupby.py at https://github.com/airbnb/chronon/blob/main/api/py/ai/chronon/group_by.py#L56

Could you follow the same style with the HISTOGRAM and HISTOGRAM_K? Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants