-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metabench is measuring the wrong thing #124
Comments
I'm not sure I follow your idea. So you propose that, for a given |
For each N, instantiate the algorithm with K different lists to avoid memoization. Increase K until CPU time > 5s. Then divide the CPU time by K to get an accurate idea of the cost of each instantiation at that N. |
Right. I see two problems with this approach as is:
I think the best we could do would be to define some static Does that seem reasonable? |
@ericniebler I think you have a point that horizontal benchmarks (with many small instantiations) have value and that we're not measuring that right now. However, saying that the current benchmarks are largely meaningless is wrong in my experience, as many "real world" metaprograms will start from a small list but generate a larger one from it, and then call algorithms on that larger list. In all cases, I don't think that's a problem with the Metabench module itself, but just a constatation that the benchmark suite is incomplete. What I would do is basically what @brunocodutra says above (except I don't understand the need for What do you think? |
Hmm that's a bit different from what I proposed and I'm not sure I got the idea. Do you expect the timings to vary across instances of a given algorithm for lists of the same size? My idea was simply to increase the total compilation time for small |
I don't expect the timings to vary across instances of the same algorithms given inputs of same size, but what we'd be measuring is the total compiler overhead when instantiating the algorithm multiple times. The compiler has internal structures that it needs to maintain and build as the compilation progresses. The 100th instantiation of the same algorithm (on a different input of the same size) could be different from the 1st instantiation due to these structures. So I think your idea (and Eric's) is different from mine in the following sense. What you want is a precise timing of algorithms on small inputs. What I'm suggesting is an imprecise (well, not too precise) timing of calling the same algorithm many times on different inputs of the same size. What I'm proposing requires less fiddling with the data, and I think it is even more useful because that's what we're interested in at the end of the day. If we try to measure the time of a single algorithm on a small input, we'll have to measure the algorithm on many small inputs and then divide, as you suggest. However, that requires making the assumption that the time for the 100th instantiation is roughly the same as the 1st instantiation, which I don't think is necessarily true. |
I see now, makes sense.
Interesting reasoning. Personally I wouldn't think the compiler overhead would play an important role here, but that's something we gotta check indeed. |
Also, use -nostdinc++ with locally-built libc++ to avoid standard library include path conflicts. Closes #126
Following up from #129 I think it would be a great idea to provide benchmarks for complete solutions to a couple of classical TMP problems as a way to assess the performance of our libraries as perceived by the end user. So, any ideas for a classical TMP problem? @porkybrain @edouarda @jonathanpoelen @gnzlbg |
Note: This is related but not exactly the same as this issue. This issue is about horizontal microbenchmarks, while what @brunocodutra is talking about is macro benchmarks. I think both would be useful, of course, but the question is what we want to invest effort in as a first start. |
Micro benchmarks are a good indicator of what to expect at the macro level. Macro benchmarks may result in libraries being optimized for the cases of the macro benchmark. |
Fair point. Then perhaps we want to focus on higher precision micro benchmarks as proposed above. Basically it would be about trying to get a precise measurement of the algorithm for small inputs, which would require executing the algorithm with many different lists of small size. |
Right, the idea was to group all related discussions under this generic issue which is called Metabench is measuring the wrong thing
I didn't mean to replace existing microbenchmarks, but rather provide macro benchmarks in addition to the benchmarks we provide today. Microbenchmarks are for us metaprogramming developers who understand what they mean, while macro benchmarks would be targeted to the users who just want a general idea of the performance of the different libraries under various complex scenarios. Finer grained benchmarks would be just a tab away anyways, so I don't see how macro benchmarks could be any harmful. I also think it would be quite fun to benchmark fairly complex scenarios. |
To me it makes sense to focus on micro-benchmarks when experimenting with optimizations. Still it might make sense to have some micro-benchmarks that are derived from typical use cases of meta-programming rather from "i think I can optimize this, let's make a micro-benchmark up to measure". If somebody is looking for ideas I would suggest looking at how meta-programming is used to implement tuple, tuple algorithms, variant, ... and the There are obvious tons of libraries that use meta-programming and that people do use, but I don't know how impactful would be trying to improve some e.g. Boost.Spirit patterns as opposed to |
|
That's a great idea for another microbenchmark. As for macrobenchmarks I was thinking something more involved such as parsing raw literals. This example in particular would be tricky to benchmark though, because the number of digits is quite limited by the |
Sorry for being way late to the party. I think there are examples of the use of huge list, kvasir.io is one of them, there is only one "device initialization", which can have a few thousand elements, which is optimized using sort and fold. Currently it crashes above 500 elements or so because sort is still not powerful enough. |
@porkybrain Sounds interesting, what are some of the more involved MP problems that must be addressed by
@ldionne Any ideas on how we could go about this? |
Very few people are manipulating type lists of > a few dozen elements. I find the current tests to be largely meaningless for real world metaprograms. It would be more interesting to run the tests up to, say, 100, but repeatedly and with lots of different types to force the number of instantiations up.
In addition, benchmarks are meaningless if the total CPU time is less than about 5s. So, for each N in [0,100] each test should keep adding unique instantiations of the algorithm being measured until the CPU time is high enough and then divide by the number of instantiations to get an accurate idea of the performance profile of that algorithm at that N.
That way, we can really start the work of optimizing our libs for the real world.
The text was updated successfully, but these errors were encountered: