-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement min_element #100
base: master
Are you sure you want to change the base?
Conversation
SummaryThis generalization of Why I dislike the approach of this PRSemantically, a fold can be fully specified by a single How to compose user-provided ordering and min_element logicI can think of three options to consider:
I believe that (1) diminishes the utility of Improving the performance of (2) is highly desirable because user-code performance can be expected to benefit from efficient composition, too. However, I expect the optimization to be difficult (impossible?) because it is already better than (you @brunocodutra) expected. When @brunocodutra pointed me at the SFINAE issues of (3), I was scared. I surely do not fully understand it yet, but I have gained some optimism that it can be mitigated. Therefor, I would like to understand what kind of failures we have to consider. Failures to be considered by approach (3) to
|
Agree 100% that this is not something we would like to do, considering Metal provides first class higher order composition that could just be reused. The only reason why we would want to do something like this would be to get better benchmarks, but frankly it may not be worth it, specially if we think we're paying for a performance improvement no one needs.
A very pleasant surprise I should add :)
Unfortunately, you can't do that in general.
Correct, more generally, any error that originates in the definition as opposed to declaration of a type template is unrecoverable through SFINAE and considered a user error.
I'm not sure I understand this statement, could you elaborate?
I'm not sure I understand, how would Perhaps I should clarify what I mean by SFINAE-friendliness. template<template<class...> class tmpl, class... args>
using is_sfinae_friendly = metal::is_invocable<metal::lambda<tmpl>, args...>; For any Metal unit tests systematically check for its own SFINAE-friendliness by attempting to evaluate algorithms on all sorts of ill-defined types. You'll see this test data has been carefully crafted to cover various edge cases and, in particular, attempts to trip Checking whether some template expression is SFINAE-friendly is in general very complex and requires deep understanding of the standard that I do not claim to have myself. Put simply, a substitution failure is only recoverable if it's triggered as part of the declaration of a type due to its own instantiation or the instantiation of one of its immediate type arguments. A substitution error that gets triggered as part of the definition of a type is never recoverable. For the purpose of this definition, a template alias is considered a declaration. If we turn back to the original definition of template<template<typename...> class expr>
struct _min_element_val<lambda<expr>> {
template<typename x, typename y>
using combiner = if_<expr<x, y>, x, y>;
// ...
}; I claim You can test this by attempting to instantiate I hope this helps. |
Wow! Thanks a lot for your excellent explanations. I very much appreciate the time you are spending to teach me these things. While your explanations are extremely useful for me, I suspect that it is most efficient (for both of us) if I do not respond to each of your points individually. Instead, I propose to respond with (hopefully) improved code. Should you be more interested in my thoughts/explanations then please let me know. |
0defd35
to
7e1f955
Compare
I start to feel more confident about the application of these metaprogramming techniques. I am looking forward to your comments to this latest approach. By the way: Should I close-and-open, or is it fine to continue here? |
What an inconsistent behavior of compilers here! Clang matches nothing, MSVC matches everything, and GCC/ICC need C++17 in order to match in the way we want them to (C++14 is not enough). I can not tell if and where to submit bug reports. |
Benchmarks are promising (Clang 8, GCC 9). The winner metal::v14::min_element (similar to the version of this PR) is based on the new helper dcall. If there are no SFINAE troubles then this is my favorite version. |
I'm fine either way, but I'll try to summarize all my thoughts and answer all your question on this single comment to hopefully make it easier for both of us to make sense of all the discussions we have in flight.
It's totally fine to keep updating this one. If you're wondering, force-pushing is fine too.
Whoa, I didn't actually check other compilers. You know you've pushed too far when compilers stop agreeing with each other. I don't know which, if any, is correct here, but experience tells me clang is usually right.
Impressive! This may be it! Just wondering, what happens if we get rid of I think template<bool>
struct _dcaller {};
template<>
struct _dcaller<true> {
template<template<class...> class expr, class... vals>
using type = expr<vals...>;
};
template<template<class...> class expr, class... vals>
using dcall = typename _dcaller<sizeof...(vals) >= 0>::template type<expr, vals...>; Whether |
I had once introduced that pack in order to improve the performance by reducing the number of class-template instantiations. I wanted a single memoized instantiation of However, a benchmark has shown no significant differences in terms of compile time. That is why I have removed that empty pack. The new version (current version of this PR) uses the plain old technique that is used all over the place in metal: The I suspect that the memory footprint is better with the empty-pack trick, but I did not have any issues with lists of size 500+. Without measurable advantage I classified that empty-pack trick as premature optimization.
Although I have discarded that empty-pack trick, I will keep that specialization technique in mind! Is it important to provide a definition of the unspecialized class template, or can I also omit the
I observed that the Thanks for the hint to make use of existing unit tests.
When optimizing my approaches, I used some hand-crafted tests in the examples, which you also discovered. I did see plenty of other attempts failing, so this one should not be super unfriendly to SFINAE :-). |
Just to back this up: here are the benchmarks using Clang 9 (see v11, v12, v13, and v14 in particular). Meanwhile, I have customized metabench to measure the "number of types" as reported by Clang with
|
Amazing work! Do consider contributing that to |
I have opened an issue there in order to share that idea. |
I like how this is boiling down to a two-liner (after some crazy intermediate attempts). Latest benchmarks of selected versions: |
I missed one of your questions last time.
Without the |
Benchmarks look pretty sweet! Those extra 30 ms on a list with 1000 elements definitely don't seem worth the pile of code that fast tracking requires. I wonder if redefining |
The most efficient solutions are the neat ones. One exception is the bulky solution v7.alias_fasttrack, which is efficient on clang, but inefficient on GCC.
By that kind of optimization we can tweak the efficiency in the range of v11...v15. Fastest (and hardest to use) are v11 and v14, but that advantage is small and Clang-only. GCC 9 is equally fast with v11..v15. |
I just benchmarked two different approaches to obtain the index (instead of the value). The winner (currently in this PR) is v17.ind.find. That was surprising to me, because my favorite was v16.ind.accumulate. I should analyze if the huge difference originates from See benchmark results for Clang 9 and GCC 9. Other ideas how to obtain the index? |
Have you tried template<class lbd, class is_empty>
struct _min_element {};
template<class lbd>
struct _min_element<lbd, true_> {
template<class>
using type = number<0>;
};
template<class lbd>
struct _min_element<lbd, false_> {
template<class state, class next>
using custom_min = if_<apply<lbd, back<transpose<next, state>>>, next, state>;
template<class seq>
using type = front<apply<lambda<custom_min>, transpose<pair<indices<seq>, seq>>>>;
}; Note: not tested. |
Could you stack this against a version that returns the value itself, so we can see the cost of recovering the index? |
You can estimate the cost of recovering the index from the latest benchmarks. v11 is the fastest version, and it returns the value. v15 is my most favorite version, and it returns the value, too. Index return is implemented in v16.ind, v17.ind, v18.ind, and v19.ind only. The latter (v16..v19) are based on v15 (using Do we need better resolution for small lists? |
I missed the fact v11 was returning the value, thanks.
I don't think it's worth it, small lists are certainly not the bottleneck of a metaprogram. |
5d2f4e6
to
658bb5b
Compare
658bb5b
to
eb057bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for taking so long to review your changes, I've been really short on time lately.
I really appreciate the amount of work you have put into this so far, I'd be ready to merge if it were not for GCC 4.7 being silly.
This is a shot in the dark, but have you tried the following pattern instead? We follow in most places and for whatever reason it seems to work well for GCC 4.7, probably thanks to the extra indirections through detail::call
. One of the examples for sort
even uses the same predicate based on sizeof
, which seems to be the reason why GCC 4.7 has a hard time with the example for min_element
.
template<class seq, class lbd = metal::lambda<metal::less>>
using min_element = call<detail::_min_element<seq>::template type, lbd>;
template<class lbd>
struct _min_element_impl {
template<class x, class y>
using custom_min = if_<invoke<lbd, y, x>, y, x>;
template<class... vals>
using type = fold_left<lambda<custom_min>, vals...>;
};
template<class seq>
struct _min_element {};
template<class... vals>
struct _min_element<list<vals...>> {
template<typename lbd>
using type = call<_min_element_impl<lbd>::template type, vals...>
};
include/metal/list/min_element.hpp
Outdated
/// \see min, sort | ||
#if !defined(METAL_WORKAROUND) | ||
template<class seq, class lbd = metal::lambda<metal::less>> | ||
using min_element = apply< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A shot in the dark, but have you tried the following pattern instead? We follow in most places and for whatever reason it seems to work well for GCC 4.7. One of the examples for sort
even uses the same smaller
predicate based on sizeof
.
template<class seq, class lbd = metal::lambda<metal::less>>
using min_element = call<detail::_min_element<seq>::template type, lbd>;
template<class lbd>
struct _min_element_impl {
template<class x, class y>
using custom_min = if_<invoke<lbd, y, x>, y, x>;
template<class... vals>
using type = fold_left<lambda<custom_min>, vals...>;
};
template<class seq>
struct _min_element {};
template<class... vals>
struct _min_element<list<vals...>> {
template<typename lbd>
using type = call<_min_element_impl<lbd>::template type, vals...>
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK((metal::min_element<metal::list<ENUM(INC(N), NUMBERS FIX(INC(M)))>>), (metal::number<0>)); \ | ||
/**/ | ||
|
||
GEN(MATRIX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test coverage!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is derived from the test of metal::sort
.
It bothers me that the test is passing while the example fails on GCC 4.7.
|
Nevermind, I was looking at MSVC out of habit. GCC 4.7 still fails to compile the example. 👎 |
For some reason, GCC 4.7 chooses the overload Can you give me any advices how to overcome that issue? #define WORKAROUND_THAT_BREAKS_GCC_4_7
// removing the line above makes GCC 4.7 happy
template<
template<template<class...> class...> class tmpl,
template<class...> class... exprs>
struct _forwarder {
using type = tmpl<exprs...>;
};
template<
template<template<class...> class...> class tmpl,
template<class...> class... exprs>
#if defined(WORKAROUND_THAT_BREAKS_GCC_4_7)
using forward = typename _forwarder<tmpl, exprs...>::type;
#else
using forward = tmpl<exprs...>;
#endif
struct _min_element_impl {
template<template<class...> class>
using type = long;
};
template<template<class...> class expr>
struct _min_element {
using type = forward<_min_element_impl::template type, expr>;
};
template<class T>
using some_expr = T;
using some_val = _min_element<some_expr>::type;
some_val foo = 4; |
If GCC 4.7 is causing so much trouble, it may be time we let it go. |
39a5465
to
24cc0d6
Compare
Continuation of #99 (comment).
I copied the code of
fold_left
because I want to benchmark old and new versions. Preliminary results can be found in the section "Change of fold left" at https://ecrypa.github.io/metal-benchmark/.