Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry different disk when bootstrap fails in Full auto mode #2659

Merged
merged 4 commits into from
Nov 30, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

import com.github.ambry.account.AccountService;
import com.github.ambry.clustermap.DiskId;
import com.github.ambry.clustermap.HardwareState;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this import is not used anywhere.

import com.github.ambry.clustermap.PartitionId;
import com.github.ambry.clustermap.ReplicaId;
import com.github.ambry.clustermap.ReplicaStatusDelegate;
Expand Down Expand Up @@ -215,6 +216,12 @@ void start() throws InterruptedException {
} catch (StoreException e) {
logger.error("Error while starting the DiskManager for {} ; no stores will be accessible on this disk.",
disk.getMountPath(), e);
// Set the state of the disk to UNAVAILABLE. This will prevent new replicas to be bootstrapped on this disk.
// TODO: We might need to use DiskFailureHandler to immediately reset existing replicas on this disk and reduce
// instance capacity so that existing replicas are reassigned to new hosts and Helix doesn't assign more replicas
// than the host can handle.
logger.info("Setting disk {} as UNAVAILABLE locally", disk.getMountPath());
disk.setState(HardwareState.UNAVAILABLE);
Arun-LinkedIn marked this conversation as resolved.
Show resolved Hide resolved
} finally {
if (!running) {
metrics.totalStoreStartFailures.inc(stores.size());
Expand Down
Loading