-
-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API for Batch Input Creation #86
Comments
No API for that because measurement quality could be very low if the timer has to be restarted often. Instead, please do 2 measurements: first measure runtime of just the setup (e.g. creating & shuffling), then benchmark everything, and subtract the first. |
Couldn't this be easily solved by having all setup done first (for x amount of iterations, and from the docs it seems like std::vector<int> generate_shuffled_vector();
ankerl::nanobench::BenchBatch<std::vector<int>>().
generateBatch([] -> std::vector<int>{
// returned batch can just be stored
// in whatever structure you see fit (eg. in this
// case perhaps `std::vector<std::vector<int>>`)
return generate_shuffled_vector();
}).run("amazing sorter", [](std::vector<int>& batch){
// `batch` is just the next entry, and next, and next...
my_amazing_sorter(batch);
}
); As long as whatever generated batch is either copy- or moveable, all should be fine... Obviously there is some overhead here, but it should be fairly stable (depending on what your "batch" is). Also I want to get away from google-bench, but need this. 🙃 |
To add to what @helmesjo said, I looked into how Criterion implemented it, and I saw the current implementation of |
Ive reopened the issues and linked all similar issues I previously closed to this one. I guess public interest is too large for me to brush it away... What I absolutely not want is start/stop of the timer in the tight measurement loop (which seems to be what Criterion is doing). That would make it impossible to benchmark something that's fast, especially because I also need to start/stop the linux performance counters. I need contemplate more on how to best do it, but currently don't have too much free time on my hand for this. |
I would love to have this feature as well. Maybe it would be the easiest to have a different entry point for benchmarks that require setup (and teardown!) to be excluded from the measurements, say |
I've landed here because I think I need this feature to benchmark 'something fast', Protobuf encode+decode. Basically, I have an encode+decode step which benchmarks out as 100ns. Then if I break this apart into two separate functions, one bench for encode and one for decode, I get 30ns for encode and 1ns for decode. These results don't really make sense naively, I'd expect X+Y=Z, 30+something=100. I suspect either something more is being optimized out (hard to tell because I'm no good at disassembly 😃 ) or this might be an artifact of running the things in two separate loops so that the branching/caching get along better. I just don't expect such a huge true difference because my message I'm encoding/decoding is less than a cache-line in size. But so far the only examples I've seen of state larger than a word carried across runs and ensuring things won't be optimized out is the gtest approach of redoing the setup within the hot loop, starting and stopping a timer for the part you're interested in. I understand this might add some overhead, but I'm not that concerned with that size of overhead, especially because when my code is embedded in another process, I will be far less certain exactly what the cost of each line of code is, due to different cache conditions/inlinability vs the microbench. |
I understand why you don't want to pause the timer between epoch iterations and I agree with you about that, but at least we really need fixtures between epochs. It's probably not ideal, but here's an example of a possible implementation : And my use case :
Connect is a function with side effects since it appends the slot to a vector, so using a fixed iteration number is better here in order to have a consistent result between each run. The teardown function serves the purpose of resetting the signal between each epoch. Without this, subsequent epochs would cause the vector to expand, leading to severe variations in timing measurements. |
Couldn't we just provide an API to do exactly this? You provide code for a "setup", then you provide code for a setup + operation, and then you run the tight loops on both, and do the subtraction?
Something like this? |
I'm looking for a way to create input for each batch. As an example, if I wanted to benchmark something like sorting algorithms I need to have a new unsorted vector for each batch, either via copying one already created or by shuffling the one already used. In both cases the setup isn't negligible, so I'm wondering if nanobench has an API which helps with that? I was thinking of something like Google Benchmarks
PauseTiming
andResumeTiming
, or something like Rust's Criterioniter_batched
anditer_batched_ref
?The text was updated successfully, but these errors were encountered: