Unless your AWS account has already onboarded to EC2 Spot, you will need to create the service linked role to avoid ServiceLinkedRoleCreationNotPermitted
.
AuthFailure.ServiceLinkedRoleCreationNotPermitted: The provided credentials do not have permission to create the service-linked role for EC2 Spot Instances
This can be resolved by creating the Service Linked Role.
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com
Karpenter adds a finalizer to nodes that it provisions to support graceful node termination. If Karpenter is uninstalled, these finalizers will cause the API Server to block deletion until the finalizers are removed.
You can fix this by patching the node objects:
kubectl edit node <node_name>
and remove the line that sayskarpenter.sh/termination
in the finalizers field.- Run the following script that gets all nodes with the finalizer and removes all the finalizers from those nodes.
- NOTE: this will remove ALL finalizers from nodes with the karpenter finalizer.
kubectl get nodes -ojsonpath='{range .items[*].metadata}{@.name}:{@.finalizers}{"\n"}' | grep "karpenter.sh/termination" | cut -d ':' -f 1 | xargs kubectl patch node --type='json' -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
If you create a Karpenter Provisioner while the webhook to default it is unavailable, it's possible to get unintentionally nil fields. Related Issue.
You may see some logs like this.
github.com/aws/karpenter/pkg/controllers/provisioning/v1alpha1/reallocation/utilization.go:84 +0x688
github.com/aws/karpenter/pkg/controllers/provisioning/v1alpha1/reallocation.(*Controller).Reconcile(0xc000b004c0, 0x23354c0, 0xc000e209f0, 0x235e640, 0xc002566c40, 0x200c786, 0x5, 0xc00259c1b0, 0x1) github.com/aws/karpenter/pkg/controllers/provisioning/v1alpha1/reallocation/controller.go:72 +0x65
github.com/aws/karpenter/pkg/controllers.(*GenericController).Reconcile(0xc000b00720, 0x23354c0, 0xc000e209f0, 0xc001db9be0, 0x7, 0xc001db9bd0, 0x7, 0xc000e209f0, 0x7fc864172d20, 0xc0000be2a0, ...)
This is fixed in Karpenter v0.2.7+. Reinstall Karpenter on the latest version.
If you have an EC2 instance get launched that is stuck in pending and ultimately not running the kubelet, you may see a message like this in your /var/log/user-data.log
:
No entry for c6i.xlarge in /etc/eks/eni-max-pods.txt
This means that your CNI plugin is out of date. You can find instructions on how to update your plugin here.