Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry different disk when bootstrap fails in Full auto mode #2659

Merged
merged 4 commits into from
Nov 30, 2023

Conversation

Arun-LinkedIn
Copy link
Contributor

@Arun-LinkedIn Arun-LinkedIn commented Nov 28, 2023

Retry different disk when bootstrap fails

@Arun-LinkedIn Arun-LinkedIn changed the title Don't use unavailable disks for bootstrapping new replicas Retry different disk when bootstrap fails in Full auto mode Nov 29, 2023
@codecov-commenter
Copy link

codecov-commenter commented Nov 29, 2023

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (be37869) 69.18% compared to head (09fbe46) 33.36%.
Report is 8 commits behind head on master.

Files Patch % Lines
...m/github/ambry/clustermap/HelixClusterManager.java 9.09% 10 Missing ⚠️
...in/java/com/github/ambry/store/StorageManager.java 94.11% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #2659       +/-   ##
=============================================
- Coverage     69.18%   33.36%   -35.83%     
+ Complexity    11010     5142     -5868     
=============================================
  Files           806      809        +3     
  Lines         65526    65560       +34     
  Branches       8006     8001        -5     
=============================================
- Hits          45337    21875    -23462     
- Misses        17620    41991    +24371     
+ Partials       2569     1694      -875     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

throw new StateTransitionException("Failed to add store " + partitionName + " into storage manager",
ReplicaOperationFailure);
} else {
// TODO: Delete any files added in store and reserve directory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should do this, otherwise, it would just take some disk space and won't let it go before we restart. I suppose we can just add a new method in disk manager to clean up the "unexpected dirs". I have a PR to do that, let's merge these two PRs and then use the method in my PR to remove the partition directories later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will merge this PR and put up a new PR

@@ -16,6 +16,7 @@

import com.github.ambry.account.AccountService;
import com.github.ambry.clustermap.DiskId;
import com.github.ambry.clustermap.HardwareState;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this import is not used anywhere.

@@ -113,6 +113,7 @@ public class HelixClusterManager implements ClusterMap {
private final PartitionSelectionHelper partitionSelectionHelper;
private final Map<String, Map<String, String>> partitionOverrideInfoMap = new HashMap<>();
private final Map<String, ReplicaId> bootstrapReplicas = new ConcurrentHashMap<>();
private final Map<String, Set<DiskId>> disksAttemptedForBootstrap = new ConcurrentHashMap<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one slightly different approach here is avoid using a disk whenver it fails a bootstrap replica. But this also would do the work

@Arun-LinkedIn Arun-LinkedIn merged commit c9537d5 into linkedin:master Nov 30, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants