-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
long reboot time due to nvme reconnect #1764
Comments
Did you reboot without draining the node? The nvme kernel initiator will keep trying to connect for some time.
But I suggest you drain the node first before rebooting otherwise the filesystems may not unmount gracefully and potentially result in data loss. |
do you mean kubectl-mayastor drain command or usual kubectl drain to evict pods? |
I mean the usual kubectl drain to evict the pods using the mayastor volumes to other nodes.
Ah I see... are you using the |
Yes, that's true. IIRC we need to remove the attachments manually. |
not using it. |
my kubernetes node lost network connection for while, now i see a lot of these errors in dmesg:
can it be related to nvme / mayastor ? |
Hi @todeb , I'm not sure about those errors above, never seen it. I can reproduce this quite easily. I think the problem is the network is being brought down before the nvme devices. I was able to to somewhat address this with a systemd service, like so:
And script: #!/usr/bin/env bash
shopt -s nullglob
CTRL_LOSS_TMO=${CTRL_LOSS_TMO:-10}
set_all_ctrl_loss_tmo() {
for ctrl in /sys/class/nvme/*; do
if ! cat $ctrl/model | grep "Mayastor NVMe controller"; then
# not our device
continue
fi
# set controller loss timeout so a broken device can be removed after the timeout
set_ctrl_loss_tmo $ctrl $CTRL_LOSS_TMO
done
}
set_ctrl_loss_tmo() {
ctrl=$1
ctrl_tmo=$ctrl/ctrl_loss_tmo
ctrl_delay=$ctrl/reconnect_delay
tmo=$2
dl=5
if [ -f $ctrl_tmo ]; then
echo "Setting ReconnectDelay of $dl for $ctrl"
echo $dl | tee -a $ctrl_delay >/dev/null
echo "Setting CtrlLossTmo of $tmo for $ctrl"
echo $tmo | tee -a $ctrl_tmo >/dev/null
fi
}
device_from_ctrl() {
local input=$1
if [[ $input =~ ^nvme([0-9]{1,5})c[0-9]{1,5}n([0-9]{1,5})$ ]]; then
echo "nvme${BASH_REMATCH[1]}n${BASH_REMATCH[2]}"
return 0
fi
return 1
}
wait_fs_device_off() {
dev_name=$1
count=0
while true; do
files=(/sys/fs/*/$dev_name)
if [ ${#files[@]} -eq 0 ]; then
break
fi
echo "File still exists, waiting..."
count=$((count+1))
if [ $count -ge 10 ]; then
echo "Maximum iterations reached, exiting."
break
fi
sleep 0.1
done
}
cleanup_device() {
local ctrl=$1
local dev_name=$2
local dev=$3
findmnt $dev
local mounts="no"
for mnt in $(findmnt $dev -no target); do
echo "Found $mnt for device $dev"
mounts="yes"
done
if [ "$mounts" = "yes" ]; then
echo -n "Unmounting $dev... "
# We've set the ctrl loss timeout, so shouldn't lock up?
umount $dev -A &
echo "done"
fi
findmnt $dev
wait_fs_device_off $dev_name
echo -n "Deleting controller $ctrl... "
echo 1 > $ctrl/delete_controller
echo "done"
}
cleanup_nvmf() {
# todo: may want to do multiple devices at once
for ctrl in /sys/class/nvme/*; do
if ! cat $ctrl/model | grep "Mayastor NVMe controller"; then
# not our device
continue
fi
for match in $ctrl/nvme*c*n*; do
echo -n "Finding block device for $ctrl... "
local dev_name=$(device_from_ctrl ${match##*/})
local dev="/dev/$dev_name"
if [ $? -eq 0 ] && [ -n $dev ]; then
echo "$dev"
cleanup_device $ctrl $dev_name $dev
else
echo "failed"
fi
done
done
}
echo "Starting NVMf Cleanup"
# set the ctrl loss timeout right away
set_all_ctrl_loss_tmo
# cleanup all mounts and controllers
cleanup_nvmf
echo "Done" If you can give this a try, let me know how it goes! |
Yeah path will become connecting, and IO may freeze until it comes back up or another takes its place |
I've got improvement on the scripts (I think :)) The service:
The script: #!/usr/bin/env bash
shopt -s nullglob
CTRL_LOSS_TMO=${CTRL_LOSS_TMO:-1}
REC_DELAY=${REC_DELAY:-1}
set_ctrl_loss_tmo() {
ctrl=$1
ctrl_tmo=$ctrl/ctrl_loss_tmo
ctrl_delay=$ctrl/reconnect_delay
tmo=$2
dl=$3
if [ -f $ctrl_tmo ]; then
echo "Setting ReconnectDelay of $dl for $ctrl"
echo $dl | tee -a $ctrl_delay >/dev/null
echo "Setting CtrlLossTmo of $tmo for $ctrl"
echo $tmo | tee -a $ctrl_tmo >/dev/null
fi
}
device_from_ctrl() {
local input=$1
if [[ $input =~ ^nvme([0-9]{1,5})c[0-9]{1,5}n([0-9]{1,5})$ ]]; then
echo "nvme${BASH_REMATCH[1]}n${BASH_REMATCH[2]}"
return 0
fi
return 1
}
remount_dev_ro() {
local ctrl=$1
local dev_name=$2
local dev=$3
for mnt in $(findmnt $dev -no target); do
echo "Found $mnt for device $dev"
nohup sh -c "mount -o remount,ro $mnt" &
done
systemd-umount $dev --no-block
# set controller loss timeout so a broken device can be removed after the timeout
set_ctrl_loss_tmo $ctrl $CTRL_LOSS_TMO $REC_DELAY
nohup sh -c "umount $dev" &
}
delete_controller() {
local ctrl=$1
local dev_name=$2
local dev=$3
echo -n "Deleting controller $ctrl... "
nohup sh -c "echo 1 > $ctrl/delete_controller" &
echo "done"
}
cleanup_nvmf() {
# todo: may want to do multiple devices at once
devices=()
for ctrl in /sys/class/nvme/*; do
if ! cat $ctrl/model | grep "Mayastor NVMe controller"; then
# not our device
continue
fi
for match in $ctrl/nvme*c*n*; do
echo -n "Finding block device for $ctrl... "
local dev_name=$(device_from_ctrl ${match##*/})
local dev="/dev/$dev_name"
if [ $? -eq 0 ] && [ -n $dev ]; then
echo "$dev"
remount_dev_ro $ctrl $dev_name $dev
devices+=("$dev_name")
else
echo "failed"
fi
done
done
sleep $REC_DELAY
sleep $CTRL_LOSS_TMO
sleep $CTRL_LOSS_TMO
for ctrl in /sys/class/nvme/*; do
if ! cat $ctrl/model | grep "Mayastor NVMe controller"; then
# not our device
continue
fi
for match in $ctrl/nvme*c*n*; do
echo -n "Finding block device for $ctrl... "
local dev_name=$(device_from_ctrl ${match##*/})
local dev="/dev/$dev_name"
if [ $? -eq 0 ] && [ -n $dev ]; then
echo "$dev"
delete_controller $ctrl $dev_name $dev
else
echo "failed"
fi
done
done
echo "Waiting for devices: ${devices[@]}"
local leftovers="no"
for n in $(seq 1 10); do
sleep 1
leftovers="no"
for dev_name in "${devices[@]}"; do
files=(/sys/fs/*/$dev_name)
echo "For $dev_name, found: ${files[@]}"
if [ ${#files[@]} -eq 0 ]; then
files=(/proc/fs/jbd2/$dev_name-*)
if [ ${#files[@]} -eq 0 ]; then
continue
fi
fi
leftovers="yes"
break
done
if [ "$leftovers" = "no" ]; then
break
fi
done
if [ "$leftovers" = "yes" ]; then
echo "Timed out waiting for all devices..."
fi
}
echo "Starting NVMf Cleanup"
cleanup_nvmf
echo "Finished NVMf Cleanup" |
Describe the bug
after initiating reboot of k8s node with mayastor, OS in 8 mins was trying to reconnect nvme.
To Reproduce
init reboot of OS
Expected behavior
I dont know how these nvmes are handled but imo nvme should not lock system reboot, especially if they failing in connection
Screenshots
** OS info (please complete the following information):**
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: