Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pick correct denominator to make quantile aggregate function's result deterministic #6661

Open
onkar opened this issue Dec 12, 2024 · 0 comments
Assignees
Labels
Sync: Jira apply to auto-create a Jira shadow ticket

Comments

@onkar
Copy link
Member

onkar commented Dec 12, 2024

Environment

Snuba SaaS

Steps to Reproduce

Some of the customers are reporting a problem that user's p50, p75 and p95 values are non-deterministic. There are quantile and quantileDeterministic aggregate functions in ClickHouse that are used in this case. They both use reservoir sampling method to compute approximate quantiles. The only difference is that quantile uses a random number generator to pickup the samples, making the results non-deterministic, while quantileDeterministic needs caller to pass in a denominator and that denominator is used to pick out the samples.

The main task is to pick a correct denominator that the customer can pass in. The lack of determinism comes from the random seed and a non-random denominator should be able to give the result we want. This ClickHouse documentation can be useful in this context. Specifically, this section of the documentation is a bit unclear:

If the same determinator value occurs too often, the function works incorrectly.

Expected Result

The fluctuation in the p50, p75 and p95 values does not occur.

Actual Result

p50, p75 and p95 values are non-deterministic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sync: Jira apply to auto-create a Jira shadow ticket
Projects
None yet
Development

No branches or pull requests

3 participants