-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrations memory bugfixes #23556
Migrations memory bugfixes #23556
Conversation
new failures in https://buildkite.com/redpanda/redpanda/builds/55426#0192428c-958e-42ed-afa3-cd32bcbd9924:
new failures in https://buildkite.com/redpanda/redpanda/builds/55631#01924cca-eee3-41a6-980c-35bef00a6724:
|
src/v/ssx/async_algorithm.h
Outdated
@@ -106,6 +106,10 @@ template<std::random_access_iterator I, typename Fn> | |||
iter_size<I> for_each_limit(const I begin, const I end, ssize_t limit, Fn f) { | |||
auto chunk_size = std::min(limit, end - begin); | |||
I chunk_end = begin + chunk_size; | |||
// std::for_each discards [[nodiscard]] callable result, check explicitly | |||
if (false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to check if function passed to for each limit returns void by using a cpp concept ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is another flavor of this function which isn't validated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to check if function passed to for each limit returns void by using a cpp concept?
Probably, but I think this would be too strict. One may want to use a function that actually returns a value which is by design okay to discard.
there is another flavor of this function which isn't validated
The one below? It is not affected by the problem, as f
is called explicitly. If f
or its return type is marked [[nodiscard]]
the call will emit warning (as error) on compilation, I have checked. future<...>
class is declared [[nodiscard]]
in Seastar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this even work?
It looks like the moment you convert the function to a function pointer the no discard attribute gets lost https://gcc.godbolt.org/z/jWbKo59r8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does work. Nodiscard attribute gets lost from a function if you convert it to a pointer indeed, as it's stored neither in the pointer nor in the pointer type. It is not, however, lost if the return type itself is marked as nodiscard: https://gcc.godbolt.org/z/bGq1ordvj. In this case the function pointer type retains the information about the function return type, which is known to be nodiscard.
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55426#0192428d-54c4-46ad-8b66-41ece58a7436 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55522#0192479c-be73-41b3-8cde-5210398ab363 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55522#0192479f-050e-4fd6-aaba-aa70f5c92b4f ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55573#01924966-16db-48ed-8607-ba552116e3fa |
_topic_migration_map.emplace(nt, metadata.id); | ||
return reconcile_topic( | ||
nt, tstate, metadata.id, mrstate.scope, true); | ||
return reconcile_topic(metadata.id, idx, nt, mrstate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix tstate lifetime
in the first commit, i have a guess about what the memory lifetime issue is that is being fixed, but i'm not certain. please state in the commit message what the issue is and how it's being fixed.
[this, &metadata, &mrstate](const auto& migration) mutable { | ||
[this, migration_id = metadata.id, &mrstate]( | ||
const auto& migration) mutable { | ||
return ss::do_with( | ||
migration.topic_nts(), | ||
[this, &metadata, &mrstate](const auto& nts) { | ||
// poor man's `nts | std::views::enumerate` | ||
auto enumerated_nts = std::views::transform( | ||
nts, [index = -1](const auto& nt) mutable { | ||
return std::forward_as_tuple(++index, nt); | ||
}); | ||
return ssx::async_for_each( | ||
// poor man's `migration.topic_nts() | std::views::enumerate` | ||
std::views::transform( | ||
migration.topic_nts(), | ||
[index = -1](const auto& nt) mutable { | ||
return std::forward_as_tuple(++index, nt); | ||
}), | ||
[this, migration_id, &mrstate](auto& enumerated_nts) { | ||
return ss::do_for_each( | ||
enumerated_nts, | ||
[this, &metadata, &mrstate](const auto& idx_nt) { | ||
[this, migration_id, &mrstate](const auto& idx_nt) { | ||
auto& [idx, nt] = idx_nt; | ||
return reconcile_topic(metadata.id, idx, nt, mrstate); | ||
return reconcile_topic(migration_id, idx, nt, mrstate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe split out the reformatting? it's kinda hard to tell what is changing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried playing with formatting, but it did not make things any cleaner. I'd suggest switching to side-by-side mode for this change.
Essentially, since ss::do_for_each
does not extend the container's lifetime, we should pass exactly its container argument via ss::do_with
, that's what the change is about. I can create ssx::do_for_each_with
if you think it has potential for reuse.
src/v/ssx/async_algorithm.h
Outdated
// std::for_each discards [[nodiscard]] callable result, check explicitly | ||
if (false) { | ||
f(*begin); | ||
} | ||
std::for_each(begin, chunk_end, std::move(f)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm slightly confused about what is going on. reading the commit message, it seems like it would be sufficient to disallow this function from being used for an f
that returns a future, which would seem like the most common mistake?
then the other thing you could do is
f(*begin);
std::for_each(++begin, chunk_end, std::move(f));
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original problem is indeed mistakenly supplying a future-returning function to ssx::async_for_each
or one of its flavours. But I see no reason to limit the fix to future-returning functions specifically, and rather do it for any functions returning with [[nodiscard]]
. It fixes the original problem and any other potential mistakes where the function result is silently lost but is not meant to.
Moreover, I don't think it would be unreasonable to create an improved version of std::for_each
(with this static check) in our base library and use it everywhere instead.
As for your second proposal, is it because you are worried the call is optimized out before the warning is emitted? I would imagine the warning is emitted by the frontend, while any optimization, even that simple, is in the later stages. Your suggestion will also UB on an empty chunk (I'm not sure if it's called at all in this case but I'd rather not break it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for your second proposal, is it because you are worried the call is optimized out before the warning is emitted? I would imagine the warning is emitted by the frontend, while any optimization, even that simple, is in the later stages. Your suggestion will also UB on an empty chunk (I'm not sure if it's called at all in this case but I'd rather not break it).
nah, just that if (false)
looks weird. fine i guess if it works, just wondering if there were a better option.
200eab0
to
7d13f36
Compare
7d13f36
to
c4a12d3
Compare
Hi all, the last commit seems to have attracted most comments, so I reworded it. @travisdowns was there any particular reason for using
? |
It was shorter than the one of the coroutine using it.
copy migration definition from migrations table, as it may disappear
- do not use ssx::async_for_each for coroutines - extend container lifetime for ss::do_for_each using ss::do_with
c4a12d3
to
ec5538d
Compare
I've removed the controversial change and will rework it. This PR remains to only fix migrations problems. |
please update release notes |
failure: https://redpandadata.atlassian.net/browse/CORE-6906 |
A few fixes where objects lifetime was not taken care of.
Backports Required
Release Notes