Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cluster): support migrate slot range #2389

Merged
merged 18 commits into from
Jul 14, 2024

Conversation

PokIsemaine
Copy link
Contributor

@PokIsemaine PokIsemaine commented Jul 1, 2024

issues: #2355

This draft PR demonstrates how to support migrating slot ranges.

What I Did

1 Migration job - 1 slot range:

  • I encapsulated a SlotRange structure and changed the migration-related class members from a single slot to a slot range.
  • Reference: [NEW] Support slot-based data migration #412. Slot migration includes the following phases: start migration, migrate existing data, migrate incremental data, and end migration. In each modified phase, the entire slot range must be completed before moving to the next phase.

TODO

TODO represents the items I hope to discuss.

  1. Support multiple slot ranges:

    • Current situation: Only one migration job can be performed at a time, and each migration job corresponds to a slot range.
    • Possible modification: Allow multiple migration jobs, migrating sequentially or in parallel.
  2. Perform multiple migrations consecutively but do not immediately use setslot to update the topology, referring to the example in TestSlotRangeMigrate:

	t.Run("MIGRATE - Repeat migration cases, but does not immediately update the topology via setslot", func(t *testing.T) {
		// Disjoint
		require.Equal(t, "OK", rdb0.Do(ctx, "clusterx", "migrate", "114-116", id1).Val())
		waitForMigrateSlotRangeState(t, rdb0, "114-116", SlotMigrationStateSuccess)
		require.Equal(t, "OK", rdb0.Do(ctx, "clusterx", "migrate", "117-118", id1).Val())
		waitForMigrateSlotRangeState(t, rdb0, "117-118", SlotMigrationStateSuccess)
		require.Equal(t, "OK", rdb0.Do(ctx, "clusterx", "migrate", "112-113", id1).Val())
		waitForMigrateSlotRangeState(t, rdb0, "112-113", SlotMigrationStateSuccess)

		errMsg := "Can't migrate slot which has been migrated"
		// TODO: Migrating 112-113, but 114-118 is covered and cannot be detected.
		// require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "114-116", id1).Err(), errMsg)
		// require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "117-118", id1).Err(), errMsg)

		// Intersection
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "112", id1).Err(), errMsg)
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "112-112", id1).Err(), errMsg)
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "113", id1).Err(), errMsg)
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "113-113", id1).Err(), errMsg)

		// Subset
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "112-113", id1).Err(), errMsg)
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "112-120", id1).Err(), errMsg)
		require.ErrorContains(t, rdb0.Do(ctx, "clusterx", "migrate", "110-112", id1).Err(), errMsg)
	})

This situation also seems to exist in the original single slot migration, and I am not sure if such operations are reasonable.

migrate A => migrate B => setslot A&B
migrate A => migrate B => migrate A (expected to fail, but allowed to pass) => setslot A&B

// slotrange A-C
migrate A-B => migrate C => setslot A-C
migrate A-B => migrate C => migrate B (expected error, but allowed to pass) => setslot A-C
  1. More precise migration failure slot range:
    The current implementation determines the entire slot range to fail and cleans it up, and the user later migrates the entire slot range again.
    Do we want to support a more precise failure range? For example, [start_slot-fail_slot), [fail_slot, end]. Users can check the status with commands such as cluster info and then re-migrate the failed slot range by themselves. (Personally, I think it's a bit cumbersome and error-prone)

Miscellaneous

Other suggestions are welcome, such as more testing, code optimization, better user interaction, etc.

@PokIsemaine PokIsemaine marked this pull request as draft July 1, 2024 05:28
Copy link
Member

@git-hulk git-hulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PokIsemaine Thanks for your impressive contributions. I will take another pass while it's ready to be reviewed.

src/cluster/cluster_defs.h Outdated Show resolved Hide resolved
src/cluster/cluster_defs.h Outdated Show resolved Hide resolved
src/cluster/cluster_defs.h Outdated Show resolved Hide resolved
src/cluster/cluster_defs.h Outdated Show resolved Hide resolved
src/cluster/cluster_defs.h Outdated Show resolved Hide resolved
src/storage/redis_db.cc Outdated Show resolved Hide resolved
src/cluster/cluster_defs.h Outdated Show resolved Hide resolved
src/cluster/slot_import.cc Outdated Show resolved Hide resolved
src/cluster/slot_import.cc Outdated Show resolved Hide resolved
src/cluster/slot_import.cc Outdated Show resolved Hide resolved
@PokIsemaine
Copy link
Contributor Author

@git-hulk @caipengbo

If it's convenient, I would like to discuss some matters related to the TODO list:

First, regarding item 2 on the TODO list, I find this situation a bit confusing because I am not sure if this is the intended design of kvrocks. If possible, please determine whether this is a normal situation.

Second, concerning items 1 and 3 on the TODO list, these are enhancements. Please evaluate whether these features are needed and if they should be included in this PR.

@caipengbo
Copy link
Contributor

@PokIsemaine

First, regarding item 2 on the TODO list, I find this situation a bit confusing because I am not sure if this is the intended design of kvrocks. If possible, please determine whether this is a normal situation.

This is indeed the case at the moment. The migrated slots is not checked when the migration is performed, and I think they should be checked.

Second, concerning items 1 and 3 on the TODO list, these are enhancements. Please evaluate whether these features are needed and if they should be included in this PR.

I think these are good improvements, but we can add them in the subsequent PR. This PR is enough.

@PokIsemaine
Copy link
Contributor Author

PokIsemaine commented Jul 9, 2024

@caipengbo Ok, thanks for the suggestion. So for the second TODO, do we try to fix it at this PR or create another separate issue to track it?

@caipengbo
Copy link
Contributor

caipengbo commented Jul 9, 2024

for the second TODO, do we try to fix it at this PR or create another separate issue to track it?

Let's fix it in this PR. @PokIsemaine

@git-hulk
Copy link
Member

@PokIsemaine I think this PR is ready to review now?

@PokIsemaine PokIsemaine changed the title feat: support migrate slot range[draft] feat: support migrate slot range Jul 11, 2024
@PokIsemaine PokIsemaine marked this pull request as ready for review July 11, 2024 15:31
@PokIsemaine
Copy link
Contributor Author

@PokIsemaine I think this PR is ready to review now?

You can start the review now :)

The clang-tidy check in the CI doesn't seem to be working, so please try to re-run it. This works in my local and fork repository CI.

@PragmaTwice
Copy link
Member

PragmaTwice commented Jul 12, 2024

Could you fix these clang-tidy issues? Please make sure the code can be successfully compiled.

/home/runner/work/kvrocks/kvrocks/build/_deps/fmt-src/include/fmt/base.h:1619:3: error: static_assert failed due to requirement 'formattable_pointer' "Formatting of non-void pointers is disallowed." [clang-diagnostic-error]
Suppressed 21102 warnings (21069 in non-user code, 32 NOLINT, 1 with check filters).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
Found compiler error(s).
  static_assert(formattable_pointer,
  ^             ~~~~~~~~~~~~~~~~~~~
/home/runner/work/kvrocks/kvrocks/build/_deps/fmt-src/include/fmt/base.h:2002:20: note: in instantiation of function template specialization 'fmt::detail::make_arg<true, fmt::context, double (double) noexcept, 0>' requested here
  return {{detail::make_arg<NUM_ARGS <= detail::max_packed_args, Context>(
                   ^
/home/runner/work/kvrocks/kvrocks/build/_deps/fmt-src/include/fmt/format.h:4357:28: note: in instantiation of function template specialization 'fmt::make_format_args<fmt::context, double (double) noexcept, std::basic_string<char>, 2UL, 0UL, 223ULL, 0>' requested here
  return vformat(fmt, fmt::make_format_args(args...));
                           ^
/home/runner/work/kvrocks/kvrocks/src/cluster/slot_migrate.cc:394:33: note: in instantiation of function template specialization 'fmt::format<double (&)(double) noexcept, std::basic_string<char> &>' requested here
    return {Status::NotOK, fmt::format("failed to iterate keys of slot {}: {}", slot, err_str)};
                                ^
/home/runner/work/kvrocks/kvrocks/src/cluster/slot_migrate.cc:393:66: error: use of undeclared identifier 'slot' [clang-diagnostic-error]
    LOG(ERROR) << "[migrate] Failed to iterate keys of slot " << slot << ": " << err_str;
                                                                 ^
/home/runner/work/kvrocks/kvrocks/src/cluster/slot_migrate.cc:394:81: error: use of undeclared identifier 'slot' [clang-diagnostic-error]
    return {Status::NotOK, fmt::format("failed to iterate keys of slot {}: {}", slot, err_str)};
                                                                                ^

@git-hulk
Copy link
Member

@PokIsemaine I think this PR is ready to review now?

You can start the review now :)

The clang-tidy check in the CI doesn't seem to be working, so please try to re-run it. This works in my local and fork repository CI.

Thank you! I will take another pass in one or two days.

@git-hulk git-hulk changed the title feat: support migrate slot range feat(cluster): support migrate slot range Jul 13, 2024
Copy link
Member

@git-hulk git-hulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR generally looks good to me, left two comments. To see if other maintainers have further comments.

src/cluster/slot_migrate.cc Outdated Show resolved Hide resolved
src/cluster/slot_migrate.cc Show resolved Hide resolved
git-hulk
git-hulk previously approved these changes Jul 13, 2024
mapleFU
mapleFU previously approved these changes Jul 13, 2024
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for we with some nits

src/cluster/cluster_defs.h Show resolved Hide resolved
src/cluster/cluster.cc Outdated Show resolved Hide resolved
src/storage/batch_extractor.cc Show resolved Hide resolved
src/storage/batch_extractor.cc Show resolved Hide resolved
src/cluster/slot_migrate.h Outdated Show resolved Hide resolved
@PokIsemaine PokIsemaine dismissed stale reviews from mapleFU and git-hulk via af1b697 July 14, 2024 10:45
@git-hulk
Copy link
Member

Will merge this PR if no further comments and CI is passed.

Copy link

sonarcloud bot commented Jul 14, 2024

@git-hulk git-hulk requested a review from mapleFU July 14, 2024 13:18
@git-hulk git-hulk merged commit beb1979 into apache:unstable Jul 14, 2024
30 checks passed
@PokIsemaine PokIsemaine deleted the slotrange-migrate-v2 branch July 14, 2024 15:12
caipengbo

This comment was marked as duplicate.

@mapleFU
Copy link
Member

mapleFU commented Jul 15, 2024

Nice work, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants