Skip to content
This repository has been archived by the owner on Jan 20, 2023. It is now read-only.

Possible bug in quantiles? #9

Open
goller opened this issue Feb 7, 2018 · 11 comments
Open

Possible bug in quantiles? #9

goller opened this issue Feb 7, 2018 · 11 comments

Comments

@goller
Copy link

goller commented Feb 7, 2018

Hey there, I'm using

	td := tdigest.NewWithCompression(1000)
	for _, x := range []float64{1, 2, 3, 4, 5, 5, 4, 3, 2, 1} {
		td.Add(x, 1)
	}
	fmt.Println("Quantile Result")
	for _, q := range []float64{0.1, 0.2, 0.5, 0.75, 0.9, 0.99, 0.999} {
		fmt.Println(q, td.Quantile(q))
	}

The results are:

Quantile Result
0.1  1
0.2  1.5
0.5  3.071428571428571
0.75 4.142857142857142
0.9  4.785714285714285
0.99 5.171428571428571
0.999 5.209999999999999

However, I think the results should be:

0.1  1
0.2  1.5
0.5  3
0.75 4
0.9  5
0.99 5
0.999 5
@spenczar
Copy link
Owner

spenczar commented Feb 7, 2018

Very possible. This implementation uses an old, buggy algorithm from an early version of the t-digest paper; that algorithm suffers instabilities under high compression (see tdunning/t-digest#78). The new "merging" t-digest algorithm is supposed to fix this, but I haven't gotten around to implementing it.

@goller
Copy link
Author

goller commented Feb 7, 2018

@spenczar

@nathanielc and I have been working on a different go client based on the merging here:
https://github.com/influxdata/tdigest

Is this useful to you?

@spenczar
Copy link
Owner

spenczar commented Feb 7, 2018

That looks excellent!

How stable and tested is your implementation? I should take a look at the source, but if you have a merging implementation already, I'd happily point people towards it, if you think it's reliable enough for heavy use.

Thank you for making the API very similar to that of github.com/spenczar/tdigest, it should make it simple for people to migrate.

@goller
Copy link
Author

goller commented Feb 7, 2018

Right on!

We'd like to get a lot more testing in first. Specifically, we want to use similar tests to the java implementation (influxdata/tdigest#5)

We are adding it to our next-generation query language here: https://github.com/influxdata/ifql/pull/234

This will allow us to stress it greatly in the coming weeks.

@spenczar
Copy link
Owner

spenczar commented Feb 7, 2018

Sounds great - and especially happy to hear you will replicate the Java implementation's test suite.

Once you feel comfortable, please check back in here. Then, I'll put a loud notice on the README that there's a better thing out there with a similar API.

@goller
Copy link
Author

goller commented Feb 7, 2018

Great... btw, LOVE twirp!

@Dieterbe
Copy link

Dieterbe commented Sep 4, 2019

what's the latest?

@nathanielc
Copy link

@Dieterbe we haven't swapped out the Java tests or anything yet, but we have been using https://github.com/influxdata/tdigest for a year now without issue.

@jinq0123
Copy link

jinq0123 commented Nov 25, 2020

func TestQuantile100(t *testing.T) {
	d := New()
	d.Add(1, 98)
	d.Add(100, 2)
	if d.Quantile(0.99) != 100 {
		t.Errorf("TDigest.quantile wrong, have=%.3f, want=%.3f", d.Quantile(0.99), 100.0)
	}
}

Fails: TDigest.quantile wrong, have=100.960, want=100.000

@eric-burel
Copy link

Hello from the future, would you recommend using influxdata version at this point?

@spenczar
Copy link
Owner

@eric-burel Yeah, I probably would. They're using it in production and for a pretty well-supported product. I'll archive this repo and point people that way.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants