-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement min_element
(benchmark linked in description)
#99
Conversation
min_element
including benchmarksmin_element
(benchmark linked in description)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you've mastered Odin's technique to recurse on template alias.
I'm actually surprised by the fact the performance of the one-liner based on higher-order composition turns out not to be that bad at all. Maintaining a custom implementation would only pay off if it outperforms the one-liner by a considerable margin. I think we can get there with fast-tracking, I suggest you have a look at metal::fold_left
and metal::fold_right
.
It may also be worth it to get some inspiration from metal::at
, in particular the mutual recursion between detail::grouper
and detail::_at
, which builds a recursive tree-like structure that is then unrolled by evaluating the nested ::type
. I think it's similar to the idea discussed in that blog post by Odin, in fact I think I was inspired by it at that time.
Either way, I'd be fine with the one-liner if you think it's not worth it to invest time in the fast-tracked version.
template<class...> class comp, class v0, class v1, | ||
class... tail> | ||
using type = typename _min_element_worker<!sizeof...( | ||
tail)>::template type<comp, if_<comp<v1, v0>, v1, v0>, tail...>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me of the way metal::fold_left
and metal::fold_right
are implemented, we can probably leverage the same strategy for fast tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I thought about that. In the ideal case, we find a SFINAE-friendly version of combiner
and feed that into fold_left
directly.
If we can not find a SFINAE-friendly combiner
then the fast-tracking is tedious: Both arguments are mentioned twice (if_<comp<v1, v0>, v1, v0>
). Do you have an idea how to solve that elegantly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a way, but it is slow on GCC9. I suspect that the aliases are recursively expanded twice for both arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n the ideal case, we find a SFINAE-friendly version of combiner and feed that into fold_left directly.
A SFINAE-friendly version of combiner
is called metal::bind
. If there were a better way to implement it, then we could just define min_element
as the beautiful one-liner.
If we can not find a SFINAE-friendly combiner then the fast-tracking is tedious.
Specially so, because it's an almost exact copy of metal::fold_left
. Do you think we could extract the internal implementation and make the c
alias as you defined a parameter such that one could implement both fold_left and min_element (and perhaps other algorithms) without any code duplication?
it is slow on GCC9.
There's a trick you could try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is slow on GCC9.
There's a trick you could try.
Yeah, good idea. Let me call that v9.foldlike_tempsave. It does help, but it is still slower than the one-liner. Weirdly, an even faster workaround is given by v10.foldlike_indirection. It is unbelievably fast on GCC 9, and it is one of the three equally fast winners on Clang 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could extract the internal implementation and make the
c
alias as you defined a parameter such that one could implement both fold_left and min_element (and perhaps other algorithms) without any code duplication?
Currently, the fastest min_element
implementation on Clang 8 and GCC 9 is v10.foldlike_indirection. If I remember correctly then you should be able to change the combiner (c
) from
template<template<typename...> class comp, class x, class y>
using _min_element_combiner = if_<comp<y, x>, y, x>;
to
template<template<typename...> class expr, class x, class y>
using _fold_left_combiner = expr<x, y>;
in order to obtain an implementation of metal::fold_left
. I suspect that it should be equally fast, but I hesitate to touch such a fundamental well tested algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unbelievably fast on GCC 9, and it is one of the three equally fast winners on Clang 8.
Wow amazing!
If I remember correctly then you should be able to change the combiner (c) from
You're definitely onto something. It's possible this indirection would trip MSVC, but worst case we can implement something analogous to detail::forward
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly then you should be able to change the combiner [...]
It seems to work. See #100. I am not sure if it is a good idea, though.
I do not know how to apply fast-tracking here because both arguments are mentioned twice. I tried this (v6), which is fast on clang 8, but does not compile on GCC 9.
Yes, those are interesting. I will definitely have a look there because I want to learn the techniques. However, I am not sure if it is worth to introduce more complexity here if the one-liner does the job well.
I believe that the one-liner is a good starting point. We could implement it, provide docs, and create tests. When more performance is required, it can be improved. |
I keep on reading the following two sources again and again, and I can not decide if I should trust that technique or not:
After some more benchmarking, I am convinced that the one-liner (v4, higher-order metafunction) is the best I can achieve currently: It is fastest on GCC 9, and it is not too bad on Clang 8. For reference, you can find all implementations in ecrypa@40b2acc (should I add commits to this PR?). |
Agreed.
It does work, but only when a dependent nested lookup is involved, that's why that "silly" |
See #73.
Performance of Clang 8 and GCC 9 is similar (v5 refers to this version; v4 is based on
metal::bind
andmetal::arg
).I suspect that the most expensive operation in this version is the recursive call of
typename _min_element_worker<false>::template type
. The recursion processes one element per step (no fast-tracking), and each call uses O(N) template arguments. It should be more efficient to cut the list into smaller pieces, calculate the minima of the pieces, and then calculate the minima of the minima. I guess you know the tree-based lookup ofkvasir
as explained in this blog post from Odin Holmes?