Skip to content

tablet repair API filtering#4301

Merged
Michal-Leszczynski merged 15 commits intomasterfrom
ml/tablet-repair-api-filtering
Mar 25, 2025
Merged

tablet repair API filtering#4301
Michal-Leszczynski merged 15 commits intomasterfrom
ml/tablet-repair-api-filtering

Conversation

@Michal-Leszczynski
Copy link
Copy Markdown
Collaborator

@Michal-Leszczynski Michal-Leszczynski commented Mar 11, 2025

This PR adjusts SM repair procedure to use tablet repair API filtering for Scylla 2025.1.0.
This means that we don't stop tablet load balancing when repairing tablet keyspaces during repair with Scylla 2025.1.0.
We still follow the same workflow as with the old repair API optimizations (we create per replica-set 'dummy' jobs just for progress which is still reported in ranges count).
Until #4303 is fixed, we test tablet repair API behavior with Scylla scylladb-ci:2025.2.0-dev-0.20250310.8d676048a6d9, which contains tablet repair API filtering feature, but is not yet based on the ubi minimal image.

Fixes #4188
Fixes #4273
Fixes #4292

@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/tablet-repair-api-filtering branch 4 times, most recently from 3df9b75 to 3c5e40e Compare March 13, 2025 14:44
Scylla 2025.1.0 is introducing new tablet repair API.
SM should use it for repairing tablet tables with a single API call.

Fixes #4188
Scylla 2025.1.0 is introducing new tablet repair API which does not
require stopping tablet load balancing during the repair.

Fixes #4273
SM repair task supports 3 types of host filtering:
* --dc - controlled by API dc filter
* --ignore-down-nodes - controlled by API host filter
* --host - does not make sense for tablet table

This commit adds validation for --ignores-down-nodes
and --host configurations.

Fixes #4292
This might result in hiding some issues, so we better
use it only when it's directly specified by the test.
It already contains the tablet repair API filtering features,
so we should start testing with it and consider moving to more
stable build later on. Unfortunately, this requires fixing #4303.

Ref #4303
@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/tablet-repair-api-filtering branch from 3c5e40e to d8091ca Compare March 13, 2025 16:09
@Michal-Leszczynski Michal-Leszczynski marked this pull request as ready for review March 17, 2025 12:10
Copy link
Copy Markdown
Collaborator

@VAveryanov8 VAveryanov8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread pkg/service/repair/service.go Outdated
Comment thread pkg/service/repair/service.go Outdated
Comment thread pkg/service/repair/service.go
@Michal-Leszczynski
Copy link
Copy Markdown
Collaborator Author

@karol-kokoszka This PR is ready for review!
I guess we can merge it with the hand picked Scylla version and change it to scylla-nightly:latest in #4306.

Comment thread pkg/service/repair/generator.go Outdated
This commit fixes a bug discovered by:
#4301 (comment)

SM was handling tablet load balancing for repair completely
wrong for cluster with both vnode and tablet keyspaces.
In such cases, it was enabling tablet load balancing
when repairing tablet keyspaces, and disabling it when
it was repairing vnode keyspaces.
@Michal-Leszczynski
Copy link
Copy Markdown
Collaborator Author

@VAveryanov8 could you take one more look at this PR? I added two commits since your review.

Comment thread pkg/service/repair/service_repair_integration_test.go Outdated
@VAveryanov8
Copy link
Copy Markdown
Collaborator

Looks good to me 👍

@Michal-Leszczynski Michal-Leszczynski merged commit ec9e631 into master Mar 25, 2025
51 checks passed
@Michal-Leszczynski Michal-Leszczynski deleted the ml/tablet-repair-api-filtering branch March 25, 2025 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tablet repair API filtering in 2025.1.0 Decide on tablet load balancing during repair Integrate tablet repair scheduler

3 participants