Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrations memory bugfixes #23556

Merged

Conversation

bashtanov
Copy link
Contributor

@bashtanov bashtanov commented Sep 30, 2024

A few fixes where objects lifetime was not taken care of.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Sep 30, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/55426#0192428c-958e-42ed-afa3-cd32bcbd9924:

"rptest.tests.delete_records_test.DeleteRecordsTest.test_delete_records_concurrent_truncations.cloud_storage_enabled=True.truncate_point=start_offset"

new failures in https://buildkite.com/redpanda/redpanda/builds/55631#01924cca-eee3-41a6-980c-35bef00a6724:

"rptest.tests.partition_force_reconfiguration_test.PartitionForceReconfigurationTest.test_basic_reconfiguration.acks=-1.restart=False.controller_snapshots=False"

@@ -106,6 +106,10 @@ template<std::random_access_iterator I, typename Fn>
iter_size<I> for_each_limit(const I begin, const I end, ssize_t limit, Fn f) {
auto chunk_size = std::min(limit, end - begin);
I chunk_end = begin + chunk_size;
// std::for_each discards [[nodiscard]] callable result, check explicitly
if (false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to check if function passed to for each limit returns void by using a cpp concept ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is another flavor of this function which isn't validated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to check if function passed to for each limit returns void by using a cpp concept?

Probably, but I think this would be too strict. One may want to use a function that actually returns a value which is by design okay to discard.

there is another flavor of this function which isn't validated

The one below? It is not affected by the problem, as f is called explicitly. If f or its return type is marked [[nodiscard]] the call will emit warning (as error) on compilation, I have checked. future<...> class is declared [[nodiscard]] in Seastar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this even work?

It looks like the moment you convert the function to a function pointer the no discard attribute gets lost https://gcc.godbolt.org/z/jWbKo59r8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does work. Nodiscard attribute gets lost from a function if you convert it to a pointer indeed, as it's stored neither in the pointer nor in the pointer type. It is not, however, lost if the return type itself is marked as nodiscard: https://gcc.godbolt.org/z/bGq1ordvj. In this case the function pointer type retains the information about the function return type, which is known to be nodiscard.

_topic_migration_map.emplace(nt, metadata.id);
return reconcile_topic(
nt, tstate, metadata.id, mrstate.scope, true);
return reconcile_topic(metadata.id, idx, nt, mrstate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix tstate lifetime

in the first commit, i have a guess about what the memory lifetime issue is that is being fixed, but i'm not certain. please state in the commit message what the issue is and how it's being fixed.

Comment on lines 1254 to 1277
[this, &metadata, &mrstate](const auto& migration) mutable {
[this, migration_id = metadata.id, &mrstate](
const auto& migration) mutable {
return ss::do_with(
migration.topic_nts(),
[this, &metadata, &mrstate](const auto& nts) {
// poor man's `nts | std::views::enumerate`
auto enumerated_nts = std::views::transform(
nts, [index = -1](const auto& nt) mutable {
return std::forward_as_tuple(++index, nt);
});
return ssx::async_for_each(
// poor man's `migration.topic_nts() | std::views::enumerate`
std::views::transform(
migration.topic_nts(),
[index = -1](const auto& nt) mutable {
return std::forward_as_tuple(++index, nt);
}),
[this, migration_id, &mrstate](auto& enumerated_nts) {
return ss::do_for_each(
enumerated_nts,
[this, &metadata, &mrstate](const auto& idx_nt) {
[this, migration_id, &mrstate](const auto& idx_nt) {
auto& [idx, nt] = idx_nt;
return reconcile_topic(metadata.id, idx, nt, mrstate);
return reconcile_topic(migration_id, idx, nt, mrstate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe split out the reformatting? it's kinda hard to tell what is changing here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried playing with formatting, but it did not make things any cleaner. I'd suggest switching to side-by-side mode for this change.

Essentially, since ss::do_for_each does not extend the container's lifetime, we should pass exactly its container argument via ss::do_with, that's what the change is about. I can create ssx::do_for_each_with if you think it has potential for reuse.

// std::for_each discards [[nodiscard]] callable result, check explicitly
if (false) {
f(*begin);
}
std::for_each(begin, chunk_end, std::move(f));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm slightly confused about what is going on. reading the commit message, it seems like it would be sufficient to disallow this function from being used for an f that returns a future, which would seem like the most common mistake?

then the other thing you could do is

f(*begin);
std::for_each(++begin, chunk_end, std::move(f));

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original problem is indeed mistakenly supplying a future-returning function to ssx::async_for_each or one of its flavours. But I see no reason to limit the fix to future-returning functions specifically, and rather do it for any functions returning with [[nodiscard]]. It fixes the original problem and any other potential mistakes where the function result is silently lost but is not meant to.

Moreover, I don't think it would be unreasonable to create an improved version of std::for_each (with this static check) in our base library and use it everywhere instead.

As for your second proposal, is it because you are worried the call is optimized out before the warning is emitted? I would imagine the warning is emitted by the frontend, while any optimization, even that simple, is in the later stages. Your suggestion will also UB on an empty chunk (I'm not sure if it's called at all in this case but I'd rather not break it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for your second proposal, is it because you are worried the call is optimized out before the warning is emitted? I would imagine the warning is emitted by the frontend, while any optimization, even that simple, is in the later stages. Your suggestion will also UB on an empty chunk (I'm not sure if it's called at all in this case but I'd rather not break it).

nah, just that if (false) looks weird. fine i guess if it works, just wondering if there were a better option.

@bashtanov bashtanov force-pushed the migrations-memory-bugfixes branch from 200eab0 to 7d13f36 Compare October 1, 2024 09:15
@bashtanov bashtanov force-pushed the migrations-memory-bugfixes branch from 7d13f36 to c4a12d3 Compare October 1, 2024 17:07
@bashtanov
Copy link
Contributor Author

Hi all, the last commit seems to have attracted most comments, so I reworded it.

@travisdowns was there any particular reason for using std::for_each instead of a for loop with iterators? Would you mind if I changed it to a loop similar to its implementation in the standard library:

  for (; __first != __last; ++__first)
    __f(*__first);

?

It was shorter than the one of the coroutine using it.
copy migration definition from migrations table, as it may disappear
- do not use ssx::async_for_each for coroutines
- extend container lifetime for ss::do_for_each using ss::do_with
@bashtanov
Copy link
Contributor Author

I've removed the controversial change and will rework it. This PR remains to only fix migrations problems.

@mmaslankaprv
Copy link
Member

please update release notes

@bashtanov
Copy link
Contributor Author

failure: https://redpandadata.atlassian.net/browse/CORE-6906
@mmaslankaprv could you merge please?

@mmaslankaprv mmaslankaprv merged commit 97b9e0d into redpanda-data:dev Oct 2, 2024
15 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants