Consider benchmarking real-world scenarios #151

ldionne · 2016-10-16T22:14:52Z

Forking #124 to discuss real-world examples more precisely. The idea is that in addition to the microbenchmarks we're already providing (and which should be made more precise by #148), we could also benchmark implementations of real-world scenarios. This would give an idea of the actual time taken by each library for each scenario, which might be useful for end-users (but probably less for TMP library developers). Some ideas:

implementing common_type
parsing raw literals

First, we should determine whether such benchmarks would be relevant, and what their target audience would be. Secondly, we should weight the value of such benchmarks with the difficulty of implementing and maintaining them. Finally, if we decide to provide such benchmarks, I believe we should also have a way to present the source code for these solutions (perhaps as suggested in #118), since that would allow end-users to balance the expressiveness of the library with its compile-time performance.

The text was updated successfully, but these errors were encountered:

odinthenerd · 2017-03-24T15:09:02Z

I would suggest something like this:

template <typename T>
using predicate = integral_constant<bool,(T::value == middle_value)>;

template<typename...Ts>
typename std::enable_if<all<list<Ts...>,predicate>::value,int>:type f(Ts...Args){
    return 1;
}

vs.

template<typename...Ts>
int f(Ts...Args){
    return 1;
}

when calling f with different sized packs of elements. I'll bet metal could SFINAE away earlier, plus we can test short circuiting of all and support for packs as inputs all in one go. For my microcontroller stuff this is certainly a common real world case and this kind of thing is in a lot of code. I think it could push the argument of SFINAE friendliness a bit.

brunocodutra · 2017-03-24T17:02:43Z

Isn't this just the same as our all benchmark?

odinthenerd · 2017-03-28T10:06:04Z

Isn't this just the same as our all benchmark?
there are many ways to solve it.

Another improvement I would suggest is testing a range rather than a specific number of inputs. We are testing 10 runs any way why not test 96,97,98,98,100,101,102,103,104,105 rather than 100 10 times? This should take the "saw tooth" out of the benchmarks of some libs that occur because of perfect match fast tracking scenarios as well as memoization effects.

brunocodutra · 2017-03-28T10:34:18Z

why not test 96,97,98,98,100,101,102,103,104,105 rather than 100 10 times

The benchmark at N is supposed to measure how long the algorithm takes for an input of size N. We run it M times and then divide timings by M so that we reduce the baseline noise, but M is entirely unrelated to N. Now if we make the input size a function of M and N, then plotting results along N wouldn't make sense anymore.

odinthenerd · 2017-03-28T13:56:41Z

I guess so, if one were to plot every N things would be clearer as well because the saw tooth would be apparent. As it is you just see some effects of the period of n matching up to the period of the saw tooth.

brunocodutra · 2017-03-28T14:09:23Z

I think the oscillation caused by fast tracking is a separate issue. What you describe is actually a poor man's low pass filter and I think would be best implemented in JS such that the user could toggle it on and off.

Another thing is benchmarking real world scenarios, which I believe would be best tackled by independent benchmarks that do something more elaborate than running algorithms with bogus metafunctions or predictable predicates.

It's not clear to me though what those would look like, we could think of something elaborate that cascades several algorithms, but then it would still be unrealistic if not associated to any real world use case.

brunocodutra · 2017-03-28T14:16:30Z

A third issue at play here is the resolution of our benchmarks with respect to N. We could improve it by sampling at smaller intervals of N, but then we would increase the pressure on Travis timeout issues.

ldionne · 2017-03-28T16:21:52Z

but then we would increase the pressure on Travis timeout issues.

Although with #179, we can go almost as crazy as we want. Poor Travis.

brunocodutra · 2017-03-29T06:52:56Z

Awesome!

ldionne added the feature label Oct 16, 2016

ldionne mentioned this issue Oct 16, 2016

Metabench is measuring the wrong thing #124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider benchmarking real-world scenarios #151

Consider benchmarking real-world scenarios #151

ldionne commented Oct 16, 2016

odinthenerd commented Mar 24, 2017 •

edited by ldionne

Loading

brunocodutra commented Mar 24, 2017

odinthenerd commented Mar 28, 2017

brunocodutra commented Mar 28, 2017 •

edited

Loading

odinthenerd commented Mar 28, 2017

brunocodutra commented Mar 28, 2017 •

edited

Loading

brunocodutra commented Mar 28, 2017

ldionne commented Mar 28, 2017

brunocodutra commented Mar 29, 2017

Consider benchmarking real-world scenarios #151

Consider benchmarking real-world scenarios #151

Comments

ldionne commented Oct 16, 2016

odinthenerd commented Mar 24, 2017 • edited by ldionne Loading

brunocodutra commented Mar 24, 2017

odinthenerd commented Mar 28, 2017

brunocodutra commented Mar 28, 2017 • edited Loading

odinthenerd commented Mar 28, 2017

brunocodutra commented Mar 28, 2017 • edited Loading

brunocodutra commented Mar 28, 2017

ldionne commented Mar 28, 2017

brunocodutra commented Mar 29, 2017

odinthenerd commented Mar 24, 2017 •

edited by ldionne

Loading

brunocodutra commented Mar 28, 2017 •

edited

Loading

brunocodutra commented Mar 28, 2017 •

edited

Loading