Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(restore): adds --dc-mapping flag to restore command #4213

Merged
merged 22 commits into from
Feb 17, 2025

Conversation

VAveryanov8
Copy link
Collaborator

@VAveryanov8 VAveryanov8 commented Jan 16, 2025

This adds support for --dc-mapping flag to restore command. It specifies mapping between DCs from the backup and DCs in the restored(target) cluster. Only 1 use case is supported: 1-1 dc mapping. This means that squeezing (restore dc1 and dc2 into dc3) or extending (restore dc1 into dc1 and dc2) DCs is not supported when --dc-mapping is provided.
So the syntax is:

source_dc1=target_dc1,source_dc2=target_dc2
Where
     equal(=) is used to separate source   dc name and target dc name
     comma(,)  is used to separate multiple mappings

If --dc-mapping is not provided, then current behavior should be preserved - each node with access to DC can download it data. Also it's allowed to provide only subset of DCs, ignoring source dc or target (or both).
Only works with tables restoration (--restore-tables=true).

Fixes: #3829


Please make sure that:

  • Code is split to commits that address a single change
  • Commit messages are informative
  • Commit titles have module prefix
  • Commit titles have issue nr. suffix

@VAveryanov8 VAveryanov8 marked this pull request as ready for review January 17, 2025 08:23
@Michal-Leszczynski
Copy link
Collaborator

Michal-Leszczynski commented Jan 17, 2025

In terms of syntax:

"dc1,!dc2=>dc2" - data from dc1 should be restored to dc2 DC. Ignoring dc2 from source cluster.

I find it confusing that it's possible to specify both restored and skipped DCs in the same mapping key.
It reads like "restore dc1 and not dc2 into dc2" instead of "don't restore dc2 and restore dc1 into dc2".
IMHO it should look like:

 "!dc2;dc1=>dc2"     - data from dc1 should be restored to dc2 DC. Ignoring dc2 from source cluster.

I would suggest to validate that positive and negative DC occurrences are not mixed in the mapping key and that negative DCs can't be mapped to anything, so that we avoid confusion and typing mistakes.

EDIT: The same goes here:

"dc1,dc2=>dc3,!dc4" - data from dc1 and dc2 DCs should be restored to dc3 DC. Ignoring dc4 DC from target cluster.

The negative DC is placed in the mapping value alongside positive DC?
Now I see that we can't simply allow to write !dc1 because it's ambiguous to whether it's a source or destination DC.

But I'm still in favor of not mixing positive and negative DCs here.
I could look like that:

> "=>!dc4;dc1,dc2=>dc3" - data from dc1 and dc2 DCs should be restored to dc3 DC. Ignoring dc4 DC from target cluster.

On the other hand, if we allow no DCs on either side of => then the ! is not needed anymore, but it's still nice for visibility.
What do you think about removing the ! in front of DC from the syntax and incorporating it into the =>? Something like:

"dc1!=>dc2,dc3;dc2=>dc1"- we want to skip dc1 in src, dc2 and dc3 in dst, and restore from dc2 in src into dc1 in dst

Alternative approach would be to add some special characters for distinguishing between src and dst DCs, but I guess that the relation to the => already takes care of that.

@Michal-Leszczynski
Copy link
Collaborator

@VAveryanov8 I made a general review, but I didn't dive into other details, because it would be better to first discuss the current comments (other things might not matter after that).

@Michal-Leszczynski
Copy link
Collaborator

Also, we should deprecate the datacenter from the --location flag, as it was previously used to partially control the dc mapping. When dc mapping flag is present, it should be ignored. When it's not, it can still be respected, but in the implementation we would still just parse it into dc mapping on the SM side.

Copy link
Collaborator

@karol-kokoszka karol-kokoszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VAveryanov8 Thanks for the extensive PR and for covering a lot of different scenarios, but we don't need such a complex logic.

It's enough to do the one to one mapping. Instead of allowing to merge multiple DCs in source or multiple DCs in target.

Let's discuss the goal of DC mapping on the call.

It's enough to have a possibility of:

  • restoring just a single DC (that's why I don't really see the reason for introducing another flag that skips validation)
  • explicitly defining that nodes from target DCX are expected to download data of source DCY and call to load & stream

@karol-kokoszka
Copy link
Collaborator

@VAveryanov8 I see some recent pushes to this PR, is it ready for re-review ?

@VAveryanov8
Copy link
Collaborator Author

@VAveryanov8 I see some recent pushes to this PR, is it ready for re-review ?

Yes, it's ready for re-review

Copy link
Collaborator

@karol-kokoszka karol-kokoszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@mikliapko There are SCT tests that were trying to restore multiDC cluster, but they were failing due to the problems with encryption at rest enabled on the cluster.
Two datacenters were using different encryption keys.

This PR is expected to fix this problem. AFAIR, you changed these tests to singleDC later. Is it possible to validate this PR against the SCT tests with the multiDC restore with encryption at rest enabled ?

@karol-kokoszka
Copy link
Collaborator

@VAveryanov8 Please rebase on master. You will have conflicts related to the new submodule, but they are suppose to be very easy to fix.

@mikliapko
Copy link

@mikliapko There are SCT tests that were trying to restore multiDC cluster, but they were failing due to the problems with encryption at rest enabled on the cluster. Two datacenters were using different encryption keys.

Yep, we can do it, just give me the SM build that can be used

@karol-kokoszka
Copy link
Collaborator

@mikliapko Here is the manager-build triggered on the current branch https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/919/

You will have to edit the test to include --dc-mapping source_dc1=>target_dc1;source_dc2=>target_dc2 into the sctool restore call.

This adds support for --dc-mapping flag to restore command.
It specifies mapping between DCs from the backup and DCs in the restored(target) cluster.
All DCs from the source cluster should be explicitly mapped to all DCs in the target cluster. The only exception is when
source and target cluster has exact match: source dcs == target dcs.
Only works with tables restoration (--restore-tables=true).
Syntax:
    "source_dc1,source_dc2=>target_dc1,target_dc2"
Multiple mappings are separated by semicolons (;). Exclamation mark (!) before a DC indicates that it should be ignored during restore.
Examples:
    "dc1,dc2=>dc3"      - data from dc1 and dc2 DCs should be restored to dc3 DC.
    "dc1,dc2=>dc3,!dc4" - data from dc1 and dc2 DCs should be restored to dc3 DC. Ignoring dc4 DC from target cluster.
    "dc1,!dc2=>dc2"     - data from dc1 should be restored to dc2 DC. Ignoring dc2 from source cluster.

Fixes: #3829
This introduces use of dc mappings when restoring tables.
Now each dc is downloading only data from corresponding dc(s)
accordingly to user provided mapping.
Also some dcs can be explicitly ignored.

Fixes: #3829
This adds another cluster to docker setup, so we can have integration
tests for dc-mappings.

Fixes: #3829
This removes support for !dc syntax and introduces new
--skip-dc-mapping-validation flag which can be used when partial restore
is needed.
This removes third cluster, but adds another data center to the second
cluster.
This introduces LocationInfo which is a direct replacement of
locationHosts, but it contains more information about Location, like
what DC are actually stored in this Location, what hosts can access it
and the list of manifests from this location. Also LocationInfo is
created with the respect of dc mappings.
This simplifies `--dc-mapping` usage only for 1 use case: 1-1
dc mapping. This means that squeezing (restore dc1 and dc2 into dc3) or
extending (restore dc1 into dc1 and dc2) DCs is not supported when
`--dc-mapping` is provided.
So the syntax is also simplified:
```
source_dc1=>target_dc1;source_dc2=>target_dc2
Where
     => is used to separate source   dc name and target dc name
     ;  is used to separate multiple mappings
```
If `--dc-mapping` is not provided, then current behavior should be
preserved - each node with access to DC can download it data.
Also it's allowed to provide only subset of DCs, ignoring source dc or
target (or both).
This removed dc3 from second cluster as `--dc-mappings` was simplified
and there is no necessity in having cluster with another dc name.
As this pr does not introduce any breaking changes anymore, integration
tests can be simplified as well.
@mikliapko
Copy link

@mikliapko Here is the manager-build triggered on the current branch https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/919/

You will have to edit the test to include --dc-mapping source_dc1=>target_dc1;source_dc2=>target_dc2 into the sctool restore call.

Testing with the provided build.

To make it work, dc mapping should be taken in quotes (see examples below):

ubuntu@manager-regression-adapt-fo-monitor-node-55ef36da-1:~$ sctool restore -c test --restore-tables --location s3:manager-backup-tests-us-east-1 --snapshot-tag sm_20250210120404UTC --dc-mapping eu-west-2scylla_node_west=>eu-west-2scylla_node_west;us-eastscylla_node_east=>us-eastscylla_node_east
Error: invalid argument "eu-west-2scylla_node_west=" for "--dc-mapping" flag: invalid syntax, mapping should be in a format of sourceDcs=>targetDcs, but got: eu-west-2scylla_node_west=

-bash: us-eastscylla_node_east=: command not found
ubuntu@manager-regression-adapt-fo-monitor-node-55ef36da-1:~$ sctool restore -c test --restore-tables --location s3:manager-backup-tests-us-east-1 --snapshot-tag sm_20250210120404UTC --dc-mapping "eu-west-2scylla_node_west=>eu-west-2scylla_node_west;us-eastscylla_node_east=>us-eastscylla_node_east"
restore/e3cc3b69-0c48-4109-ad78-fbe6257f6eb9

@VAveryanov8 @karol-kokoszka Is it inevitable in this case?

@VAveryanov8
Copy link
Collaborator Author

VAveryanov8 commented Feb 10, 2025

To make it work, dc mapping should be taken in quotes (see examples below):

For this particular syntax yes, quotes are required.

But the good news is that I'm gonna get rid of it towards more simple syntax using just equal(=) instead of array(=>) and comma(,) instead of semi-column(;). Then I suppose it will be possible to use without quotes.

@mikliapko
Copy link

To make it work, dc mapping should be taken in quotes (see examples below):

For this particular syntax yes, quotes are required.

But the good news is that I'm gonna get rid of it towards more simple syntax using just equal(=) instead of array(=>) and comma(,) instead of semi-column(;). Then I suppose it will be possible to use without quotes.

Great, please, let me know when you will have the build with new implementation

mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this pull request Feb 10, 2025
To run a restore task for multiDC cluster with EaR enabled (otherwise
fails #1), the special flag has been introduced in Manager (#2) to map
backed up DC with DC under restore, for example,

sctool restore ... --dc-mapping dc1=dc1,dc2=dc2

The change introduces this new flag into `create_restore_task` method
and makes sure that if Scylla cluster has more than 1 datacenter - the
restore task will be triggered with this flag applied.

refs:
1. scylladb/scylla-manager#3871
2. scylladb/scylla-manager#4213
This simplifies syntax of dc-mapping by leveraging pflag.StringToString
functionality.
Also extends integration tests with validation that each node should
download only tables from corresponding DCs when dc-mapping is provided.
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks nice!
Just some question about the comment.

@karol-kokoszka
Copy link
Collaborator

@mikliapko Here is the latest build from this branch https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/923/
It contains all recent changes.

mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this pull request Feb 12, 2025
To run a restore task for multiDC cluster with EaR enabled (otherwise
fails #1), the special flag has been introduced in Manager (#2) to map
backed up DC with DC under restore, for example,

sctool restore ... --dc-mapping dc1=dc1,dc2=dc2

The change introduces this new flag into `create_restore_task` method
and makes sure that if Scylla cluster has more than 1 datacenter - the
restore task will be triggered with this flag applied.

refs:
1. scylladb/scylla-manager#3871
2. scylladb/scylla-manager#4213
@mikliapko
Copy link

mikliapko commented Feb 12, 2025

@mikliapko
Copy link

Retesting here https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/mikita/job/manager-master/job/ubuntu22-sanity-test/70/

Okay, the test is green

@karol-kokoszka
Copy link
Collaborator

@mikliapko

Retesting here https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/mikita/job/manager-master/job/ubuntu22-sanity-test/70/

Okay, the test is green

But does it fail on the master branch ?

@mikliapko
Copy link

But does it fail on the master branch ?

I haven't run it there since it doesn't contain the fix for multiDC restore.
So, it's supposed to fail.

@karol-kokoszka
Copy link
Collaborator

@VAveryanov8 feel free to merge it.

@VAveryanov8 VAveryanov8 merged commit af23793 into master Feb 17, 2025
52 checks passed
@VAveryanov8 VAveryanov8 deleted the va/dc-mapping branch February 17, 2025 15:34
karol-kokoszka pushed a commit that referenced this pull request Feb 24, 2025
This adds support for `--dc-mapping` flag to restore command. It specifies mapping between DCs from the backup and DCs in the restored(target) cluster. Only 1 use case is supported: 1-1 dc mapping. This means that squeezing (restore dc1 and dc2 into dc3) or extending (restore dc1 into dc1 and dc2) DCs is not supported when --dc-mapping is provided.
So the syntax is:

source_dc1=target_dc1,source_dc2=target_dc2
Where
     equal(=) is used to separate source   dc name and target dc name
     comma(,)  is used to separate multiple mappings

If --dc-mapping is not provided, then current behavior should be preserved - each node with access to DC can download it data. Also it's allowed to provide only subset of DCs, ignoring source dc or target (or both).
Only works with tables restoration (--restore-tables=true).

Fixes: #3829
@karol-kokoszka karol-kokoszka mentioned this pull request Feb 24, 2025
karol-kokoszka pushed a commit that referenced this pull request Feb 24, 2025
This adds support for `--dc-mapping` flag to restore command. It specifies mapping between DCs from the backup and DCs in the restored(target) cluster. Only 1 use case is supported: 1-1 dc mapping. This means that squeezing (restore dc1 and dc2 into dc3) or extending (restore dc1 into dc1 and dc2) DCs is not supported when --dc-mapping is provided.
So the syntax is:

source_dc1=target_dc1,source_dc2=target_dc2
Where
     equal(=) is used to separate source   dc name and target dc name
     comma(,)  is used to separate multiple mappings

If --dc-mapping is not provided, then current behavior should be preserved - each node with access to DC can download it data. Also it's allowed to provide only subset of DCs, ignoring source dc or target (or both).
Only works with tables restoration (--restore-tables=true).

Fixes: #3829
soyacz pushed a commit to scylladb/scylla-cluster-tests that referenced this pull request Feb 28, 2025
To run a restore task for multiDC cluster with EaR enabled (otherwise
fails #1), the special flag has been introduced in Manager (#2) to map
backed up DC with DC under restore, for example,

sctool restore ... --dc-mapping dc1=dc1,dc2=dc2

The change introduces this new flag into `create_restore_task` method
and makes sure that if Scylla cluster has more than 1 datacenter - the
restore task will be triggered with this flag applied.

refs:
1. scylladb/scylla-manager#3871
2. scylladb/scylla-manager#4213
Michal-Leszczynski pushed a commit that referenced this pull request Mar 10, 2025
This adds support for `--dc-mapping` flag to restore command. It specifies mapping between DCs from the backup and DCs in the restored(target) cluster. Only 1 use case is supported: 1-1 dc mapping. This means that squeezing (restore dc1 and dc2 into dc3) or extending (restore dc1 into dc1 and dc2) DCs is not supported when --dc-mapping is provided.
So the syntax is:

source_dc1=target_dc1,source_dc2=target_dc2
Where
     equal(=) is used to separate source   dc name and target dc name
     comma(,)  is used to separate multiple mappings

If --dc-mapping is not provided, then current behavior should be preserved - each node with access to DC can download it data. Also it's allowed to provide only subset of DCs, ignoring source dc or target (or both).
Only works with tables restoration (--restore-tables=true).

Fixes: #3829
(cherry picked from commit af23793)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Give possibility for restoring DC using mapping sourceDC -> destinationDC
4 participants