You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// if stateless CNI fail to get the endpoint from CNS for any reason other than Endpoint Not found or CNS connection failure
1073
-
// return a retriable error so the container runtime will retry this DEL later
1074
-
// the implementation of this function returns nil if the endpoint doesn't exist, so
1075
-
// we don't have to check that here
1076
-
iferr!=nil {
1077
-
switch {
1078
-
caseerrors.Is(err, network.ErrConnectionFailure):
1079
-
logger.Error("Failed to connect to CNS", zap.Error(err))
1080
-
logger.Info("Endpoint will be deleted from state file asynchronously", zap.String("containerID", args.ContainerID))
1081
-
// In SwiftV2 Linux stateless CNI mode, if the plugin cannot connect to CNS,
1082
-
// we asynchronously remove the secondary (delegated) interface from the pod’s network namespace in the absence of the endpoint state.
1083
-
// This is necessary because leaving the delegated NIC in the pod netns can cause the kernel to block rtnetlink operations.
1084
-
// When that happens, kubelet and containerd hang during sandbox creation or teardown.
1085
-
// The delegated NIC (SR-IOV VF) used by SwiftV2 for multitenant pods remains tied to the pod namespace,
1086
-
// triggering hot-unplug/re-register events and leaving the node in an unhealthy state.
1087
-
// This workaround mitigates the issue by removing the secondary NIC from the pod netns when CNS is unreachable during DEL to provide the endpoint state.
returnplugin.RetriableError(fmt.Errorf("failed to retrieve endpoint: %w", err))
1106
1070
}
1107
-
1108
-
// for Stateful CNI when the endpoint is not created, but the ips are already allocated (only works if single network, single infra)
1109
-
// this block is applied to stateless CNI only if there was a connection failure in previous block and asynchronous delete by CNS will remover the endpoint from state file
1071
+
// when the endpoint is not created, but the ips are already allocated (only works if single network, single infra)
logger.Error("Failed to connect to CNS", zap.Error(err))
879
+
logger.Info("Endpoint will be deleted from state file asynchronously", zap.String("containerID", args.ContainerID))
880
+
// In SwiftV2 Linux stateless CNI mode, if the plugin cannot connect to CNS,
881
+
// we still have to remove the secondary (delegated) interface from the pod’s network namespace in the absence of the endpoint state.
882
+
// This is necessary because leaving the delegated NIC in the pod netns can cause the kernel to block rtnetlink operations.
883
+
// When that happens, kubelet and containerd hang during sandbox creation or teardown.
884
+
// The delegated NIC (SR-IOV VF) used by SwiftV2 for multitenant pods remains tied to the pod namespace,
885
+
// triggering hot-unplug/re-register events and leaving the node in an unhealthy state.
886
+
// This workaround mitigates the issue by generating a minimal endpointInfo via containerd args and netlink APIs that can be then passed to DeleteEndpoint API.
887
+
epInfos, err=nm.generateEndpointLocally(args)
888
+
iferr!=nil {
889
+
logger.Error("Failed to fetch secondary endpoint from pod netns", zap.String("netns", args.Netns), zap.Error(err))
890
+
returnnil, fmt.Errorf("failed to fetch secondary interfaces: %w", err)
891
+
}
892
+
caseerrors.Is(err, ErrEndpointStateNotFound):
893
+
logger.Info("Endpoint Not found", zap.String("containerID", args.ContainerID), zap.Error(err))
894
+
returnnil, nil
895
+
default:
896
+
logger.Error("Get Endpoint State API returned error", zap.String("containerID", args.ContainerID), zap.Error(err))
897
+
returnnil, ErrEndpointRetrievalFailure
898
+
}
899
+
}
900
+
for_, epInfo:=rangeepInfos {
901
+
logger.Info("Found endpoint to delete", zap.String("IfName", epInfo.IfName), zap.String("EndpointID", epInfo.EndpointID), zap.Any("NICType", epInfo.NICType))
0 commit comments