Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible benchmark code? #3

Open
craigbarnes opened this issue Jul 16, 2019 · 6 comments
Open

Reproducible benchmark code? #3

craigbarnes opened this issue Jul 16, 2019 · 6 comments

Comments

@craigbarnes
Copy link

craigbarnes commented Jul 16, 2019

Is the code used to produce the numbers in the readme available somewhere? Based on the methodology described, it seems likely that triehash is benefiting significantly from (unrealistically favourable) branch prediction:

... each hash function was run 1,000,000 times for each word

@julian-klode
Copy link
Owner

@julian-klode
Copy link
Owner

Needs access to the Debian porter boxes, and probably updates for changes there, to run the entire thing; but can also run locally (make run).

@julian-klode
Copy link
Owner

Not sure how you'd test that differently, though, you do need to run the function a lot to get usable results.

@craigbarnes
Copy link
Author

craigbarnes commented Jul 16, 2019

Thanks for the pointers. I just wanted to see how unpredictable inputs would change the results.

To make it fair and avoid adding too much overhead to the loop, I was thinking the best way might be to pre-compute an array of randomly selected (known) words and use words[iteration % ARRAY_SIZE(words)] as the input. Otherwise the branch predictor is getting perfect conditions for most of the 1,000,000 iterations, when in practice most of the branches are probably highly unpredictable.

@julian-klode
Copy link
Owner

Won't that be affected too much by data caches?

@craigbarnes
Copy link
Author

craigbarnes commented Jul 16, 2019

It'll increase cache pressure I guess, but not by much for a small array. It'd certainly be much closer to a real world scenario than repeating the same input a million times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants