Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider benchmarking real-world scenarios #151

Open
ldionne opened this issue Oct 16, 2016 · 9 comments
Open

Consider benchmarking real-world scenarios #151

ldionne opened this issue Oct 16, 2016 · 9 comments
Labels

Comments

@ldionne
Copy link
Owner

ldionne commented Oct 16, 2016

Forking #124 to discuss real-world examples more precisely. The idea is that in addition to the microbenchmarks we're already providing (and which should be made more precise by #148), we could also benchmark implementations of real-world scenarios. This would give an idea of the actual time taken by each library for each scenario, which might be useful for end-users (but probably less for TMP library developers). Some ideas:

  • implementing common_type
  • parsing raw literals

First, we should determine whether such benchmarks would be relevant, and what their target audience would be. Secondly, we should weight the value of such benchmarks with the difficulty of implementing and maintaining them. Finally, if we decide to provide such benchmarks, I believe we should also have a way to present the source code for these solutions (perhaps as suggested in #118), since that would allow end-users to balance the expressiveness of the library with its compile-time performance.

@odinthenerd
Copy link
Contributor

odinthenerd commented Mar 24, 2017

I would suggest something like this:

template <typename T>
using predicate = integral_constant<bool,(T::value == middle_value)>;

template<typename...Ts>
typename std::enable_if<all<list<Ts...>,predicate>::value,int>:type f(Ts...Args){
    return 1;
}

vs.

template<typename...Ts>
int f(Ts...Args){
    return 1;
}

when calling f with different sized packs of elements. I'll bet metal could SFINAE away earlier, plus we can test short circuiting of all and support for packs as inputs all in one go. For my microcontroller stuff this is certainly a common real world case and this kind of thing is in a lot of code. I think it could push the argument of SFINAE friendliness a bit.

@brunocodutra
Copy link
Collaborator

Isn't this just the same as our all benchmark?

@odinthenerd
Copy link
Contributor

Isn't this just the same as our all benchmark?
there are many ways to solve it.

Another improvement I would suggest is testing a range rather than a specific number of inputs. We are testing 10 runs any way why not test 96,97,98,98,100,101,102,103,104,105 rather than 100 10 times? This should take the "saw tooth" out of the benchmarks of some libs that occur because of perfect match fast tracking scenarios as well as memoization effects.

@brunocodutra
Copy link
Collaborator

brunocodutra commented Mar 28, 2017

why not test 96,97,98,98,100,101,102,103,104,105 rather than 100 10 times

The benchmark at N is supposed to measure how long the algorithm takes for an input of size N. We run it M times and then divide timings by M so that we reduce the baseline noise, but M is entirely unrelated to N. Now if we make the input size a function of M and N, then plotting results along N wouldn't make sense anymore.

@odinthenerd
Copy link
Contributor

I guess so, if one were to plot every N things would be clearer as well because the saw tooth would be apparent. As it is you just see some effects of the period of n matching up to the period of the saw tooth.

@brunocodutra
Copy link
Collaborator

brunocodutra commented Mar 28, 2017

I think the oscillation caused by fast tracking is a separate issue. What you describe is actually a poor man's low pass filter and I think would be best implemented in JS such that the user could toggle it on and off.

Another thing is benchmarking real world scenarios, which I believe would be best tackled by independent benchmarks that do something more elaborate than running algorithms with bogus metafunctions or predictable predicates.

It's not clear to me though what those would look like, we could think of something elaborate that cascades several algorithms, but then it would still be unrealistic if not associated to any real world use case.

@brunocodutra
Copy link
Collaborator

A third issue at play here is the resolution of our benchmarks with respect to N. We could improve it by sampling at smaller intervals of N, but then we would increase the pressure on Travis timeout issues.

@ldionne
Copy link
Owner Author

ldionne commented Mar 28, 2017

but then we would increase the pressure on Travis timeout issues.

Although with #179, we can go almost as crazy as we want. Poor Travis.

@brunocodutra
Copy link
Collaborator

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants