-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider benchmarking real-world scenarios #151
Comments
I would suggest something like this: template <typename T>
using predicate = integral_constant<bool,(T::value == middle_value)>;
template<typename...Ts>
typename std::enable_if<all<list<Ts...>,predicate>::value,int>:type f(Ts...Args){
return 1;
} vs. template<typename...Ts>
int f(Ts...Args){
return 1;
} when calling f with different sized packs of elements. I'll bet metal could SFINAE away earlier, plus we can test short circuiting of all and support for packs as inputs all in one go. For my microcontroller stuff this is certainly a common real world case and this kind of thing is in a lot of code. I think it could push the argument of SFINAE friendliness a bit. |
Isn't this just the same as our |
Another improvement I would suggest is testing a range rather than a specific number of inputs. We are testing 10 runs any way why not test 96,97,98,98,100,101,102,103,104,105 rather than 100 10 times? This should take the "saw tooth" out of the benchmarks of some libs that occur because of perfect match fast tracking scenarios as well as memoization effects. |
The benchmark at N is supposed to measure how long the algorithm takes for an input of size N. We run it M times and then divide timings by M so that we reduce the baseline noise, but M is entirely unrelated to N. Now if we make the input size a function of M and N, then plotting results along N wouldn't make sense anymore. |
I guess so, if one were to plot every N things would be clearer as well because the saw tooth would be apparent. As it is you just see some effects of the period of n matching up to the period of the saw tooth. |
I think the oscillation caused by fast tracking is a separate issue. What you describe is actually a poor man's low pass filter and I think would be best implemented in JS such that the user could toggle it on and off. Another thing is benchmarking real world scenarios, which I believe would be best tackled by independent benchmarks that do something more elaborate than running algorithms with bogus metafunctions or predictable predicates. It's not clear to me though what those would look like, we could think of something elaborate that cascades several algorithms, but then it would still be unrealistic if not associated to any real world use case. |
A third issue at play here is the resolution of our benchmarks with respect to N. We could improve it by sampling at smaller intervals of N, but then we would increase the pressure on Travis timeout issues. |
Although with #179, we can go almost as crazy as we want. Poor Travis. |
Awesome! |
Forking #124 to discuss real-world examples more precisely. The idea is that in addition to the microbenchmarks we're already providing (and which should be made more precise by #148), we could also benchmark implementations of real-world scenarios. This would give an idea of the actual time taken by each library for each scenario, which might be useful for end-users (but probably less for TMP library developers). Some ideas:
common_type
First, we should determine whether such benchmarks would be relevant, and what their target audience would be. Secondly, we should weight the value of such benchmarks with the difficulty of implementing and maintaining them. Finally, if we decide to provide such benchmarks, I believe we should also have a way to present the source code for these solutions (perhaps as suggested in #118), since that would allow end-users to balance the expressiveness of the library with its compile-time performance.
The text was updated successfully, but these errors were encountered: