implement `min_element` (benchmark linked in description) #99

ecrypa · 2019-10-05T14:03:07Z

See #73.

Performance of Clang 8 and GCC 9 is similar (v5 refers to this version; v4 is based on metal::bind and metal::arg).

I suspect that the most expensive operation in this version is the recursive call of typename _min_element_worker<false>::template type. The recursion processes one element per step (no fast-tracking), and each call uses O(N) template arguments. It should be more efficient to cut the list into smaller pieces, calculate the minima of the pieces, and then calculate the minima of the minima. I guess you know the tree-based lookup of kvasir as explained in this blog post from Odin Holmes?

brunocodutra

Looks like you've mastered Odin's technique to recurse on template alias.

I'm actually surprised by the fact the performance of the one-liner based on higher-order composition turns out not to be that bad at all. Maintaining a custom implementation would only pay off if it outperforms the one-liner by a considerable margin. I think we can get there with fast-tracking, I suggest you have a look at metal::fold_left and metal::fold_right.

It may also be worth it to get some inspiration from metal::at, in particular the mutual recursion between detail::grouper and detail::_at, which builds a recursive tree-like structure that is then unrolled by evaluating the nested ::type. I think it's similar to the idea discussed in that blog post by Odin, in fact I think I was inspired by it at that time.

Either way, I'd be fine with the one-liner if you think it's not worth it to invest time in the fast-tracked version.

include/metal/list/min_element.hpp

brunocodutra · 2019-10-05T15:40:21Z

include/metal/list/min_element.hpp

+                template<class...> class comp, class v0, class v1,
+                class... tail>
+            using type = typename _min_element_worker<!sizeof...(
+                tail)>::template type<comp, if_<comp<v1, v0>, v1, v0>, tail...>;


This reminds me of the way metal::fold_left and metal::fold_right are implemented, we can probably leverage the same strategy for fast tracking.

Yes, I thought about that. In the ideal case, we find a SFINAE-friendly version of combiner and feed that into fold_left directly.

If we can not find a SFINAE-friendly combiner then the fast-tracking is tedious: Both arguments are mentioned twice (if_<comp<v1, v0>, v1, v0>). Do you have an idea how to solve that elegantly?

I found a way, but it is slow on GCC9. I suspect that the aliases are recursively expanded twice for both arguments.

n the ideal case, we find a SFINAE-friendly version of combiner and feed that into fold_left directly.

A SFINAE-friendly version of combiner is called metal::bind. If there were a better way to implement it, then we could just define min_element as the beautiful one-liner.

If we can not find a SFINAE-friendly combiner then the fast-tracking is tedious.

Specially so, because it's an almost exact copy of metal::fold_left. Do you think we could extract the internal implementation and make the c alias as you defined a parameter such that one could implement both fold_left and min_element (and perhaps other algorithms) without any code duplication?

it is slow on GCC9.

There's a trick you could try.

it is slow on GCC9.

There's a trick you could try.

Yeah, good idea. Let me call that v9.foldlike_tempsave. It does help, but it is still slower than the one-liner. Weirdly, an even faster workaround is given by v10.foldlike_indirection. It is unbelievably fast on GCC 9, and it is one of the three equally fast winners on Clang 8.

Do you think we could extract the internal implementation and make the c alias as you defined a parameter such that one could implement both fold_left and min_element (and perhaps other algorithms) without any code duplication?

Currently, the fastest min_element implementation on Clang 8 and GCC 9 is v10.foldlike_indirection. If I remember correctly then you should be able to change the combiner (c) from

template<template<typename...> class comp, class x, class y> using _min_element_combiner = if_<comp<y, x>, y, x>;

to

template<template<typename...> class expr, class x, class y> using _fold_left_combiner = expr<x, y>;

in order to obtain an implementation of metal::fold_left. I suspect that it should be equally fast, but I hesitate to touch such a fundamental well tested algorithm.

It is unbelievably fast on GCC 9, and it is one of the three equally fast winners on Clang 8.

Wow amazing!

If I remember correctly then you should be able to change the combiner (c) from

You're definitely onto something. It's possible this indirection would trip MSVC, but worst case we can implement something analogous to detail::forward.

If I remember correctly then you should be able to change the combiner [...]

It seems to work. See #100. I am not sure if it is a good idea, though.

ecrypa · 2019-10-05T16:47:12Z

I'm actually surprised by the fact the performance of the one-liner based on higher-order composition turns out not to be that bad at all. Maintaining a custom implementation would only pay off if it outperforms the one-liner by a considerable margin. I think we can get there with fast-tracking, I suggest you have a look at metal::fold_left and metal::fold_right.

I do not know how to apply fast-tracking here because both arguments are mentioned twice. I tried this (v6), which is fast on clang 8, but does not compile on GCC 9.

It may also be worth it to get some inspiration from metal::at, in particular the mutual recursion between detail::grouper and detail::_at, which builds a recursive tree-like structure that is then unrolled by evaluating the nested ::type. I think it's similar to the idea discussed in that blog post by Odin, in fact I think I was inspired by it at that time.

Yes, those are interesting. I will definitely have a look there because I want to learn the techniques. However, I am not sure if it is worth to introduce more complexity here if the one-liner does the job well.

Either way, I'd be fine with the one-liner if you think it's not worth it to invest time in the fast-tracked version.

I believe that the one-liner is a good starting point. We could implement it, provide docs, and create tests. When more performance is required, it can be improved.

ecrypa · 2019-10-06T10:29:19Z

Looks like you've mastered Odin's technique to recurse on template alias.

I keep on reading the following two sources again and again, and I can not decide if I should trust that technique or not:

I believe that the one-liner is a good starting point. We could implement it, provide docs, and create tests. When more performance is required, it can be improved.

After some more benchmarking, I am convinced that the one-liner (v4, higher-order metafunction) is the best I can achieve currently: It is fastest on GCC 9, and it is not too bad on Clang 8. For reference, you can find all implementations in ecrypa@40b2acc (should I add commits to this PR?).

brunocodutra · 2019-10-06T12:03:53Z

Yes, those are interesting. I will definitely have a look there because I want to learn the techniques. However, I am not sure if it is worth to introduce more complexity here if the one-liner does the job well.

Agreed.

I keep on reading the following two sources again and again, and I can not decide if I should trust that technique or not:

It does work, but only when a dependent nested lookup is involved, that's why that "silly" (sizeof...(Ts)<10000000) is there. If we replace it by true it fails with the same error you see in that bug report even on Clang 9 with C++20 enabled. The fact all three major compilers consistently agree on this tells me there must be something buried deep in the language specification that implies this surprising behavior, but I wouldn't be able to tell you exactly what that is.

implement min_element

96461c7

ecrypa changed the title ~~implement min_element including benchmarks~~ implement min_element (benchmark linked in description) Oct 5, 2019

brunocodutra reviewed Oct 5, 2019

View reviewed changes

ecrypa referenced this pull request in ecrypa/metal Oct 6, 2019

implement min_element v4...v8

40b2acc

ecrypa mentioned this pull request Oct 6, 2019

implement min_element #100

Open

ecrypa closed this Oct 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement `min_element` (benchmark linked in description) #99

implement `min_element` (benchmark linked in description) #99

ecrypa commented Oct 5, 2019

brunocodutra left a comment

brunocodutra Oct 5, 2019

ecrypa Oct 5, 2019

ecrypa Oct 6, 2019

brunocodutra Oct 6, 2019

ecrypa Oct 6, 2019

ecrypa Oct 6, 2019

brunocodutra Oct 6, 2019

ecrypa Oct 6, 2019

ecrypa commented Oct 5, 2019

ecrypa commented Oct 6, 2019

brunocodutra commented Oct 6, 2019

implement min_element (benchmark linked in description) #99

implement min_element (benchmark linked in description) #99

Conversation

ecrypa commented Oct 5, 2019

brunocodutra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecrypa commented Oct 5, 2019

ecrypa commented Oct 6, 2019

brunocodutra commented Oct 6, 2019

implement `min_element` (benchmark linked in description) #99

implement `min_element` (benchmark linked in description) #99