|
| 1 | +# Karpenter v1 API |
| 2 | + |
| 3 | +_This RFC is an extension of the [v1 API RFC](https://github.com/kubernetes-sigs/karpenter/blob/main/designs/v1-api.md) that is merged in the [`kubernetes-sigs/karpenter` repo](https://github.com/kubernetes-sigs/karpenter)._ |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +Karpenter released the beta version of its APIs and features in October 2023. The intention behind this beta was that we would be able to determine the final set of changes and feature adds that we wanted to add to Karpenter before we considered Karpenter feature-complete. The list below details the features that Karpenter has on its roadmap before Karpenter becomes feature complete and stable at v1. |
| 8 | + |
| 9 | +### Categorization |
| 10 | + |
| 11 | +This list represents the minimal set of changes that are needed to ensure proper operational excellence, feature completeness, and stability by v1. For a change to make it on this list, it must meet one of the following criteria: |
| 12 | + |
| 13 | +1. Breaking: The feature requires changes or removals from the API that would be considered breaking after a bump to v1 |
| 14 | +2. Stability: The feature ensures proper operational excellence for behavior that is leaky or has race conditions in the beta state |
| 15 | +3. Planned Deprecations: The feature cleans-up deprecations that were previously planned the project |
| 16 | + |
| 17 | +## EC2NodeClass API |
| 18 | + |
| 19 | +``` |
| 20 | +apiVersion: karpenter.k8s.aws/v1 |
| 21 | +kind: EC2NodeClass |
| 22 | +metadata: |
| 23 | + name: default |
| 24 | +spec: |
| 25 | + kubelet: |
| 26 | + podsPerCore: 2 |
| 27 | + maxPods: 20 |
| 28 | + systemReserved: |
| 29 | + cpu: 100m |
| 30 | + memory: 100Mi |
| 31 | + ephemeral-storage: 1Gi |
| 32 | + kubeReserved: |
| 33 | + cpu: 200m |
| 34 | + memory: 100Mi |
| 35 | + ephemeral-storage: 3Gi |
| 36 | + evictionHard: |
| 37 | + memory.available: 5% |
| 38 | + nodefs.available: 10% |
| 39 | + nodefs.inodesFree: 10% |
| 40 | + evictionSoft: |
| 41 | + memory.available: 500Mi |
| 42 | + nodefs.available: 15% |
| 43 | + nodefs.inodesFree: 15% |
| 44 | + evictionSoftGracePeriod: |
| 45 | + memory.available: 1m |
| 46 | + nodefs.available: 1m30s |
| 47 | + nodefs.inodesFree: 2m |
| 48 | + evictionMaxPodGracePeriod: 60 |
| 49 | + imageGCHighThresholdPercent: 85 |
| 50 | + imageGCLowThresholdPercent: 80 |
| 51 | + cpuCFSQuota: true |
| 52 | + clusterDNS: ["10.0.1.100"] |
| 53 | + subnetSelectorTerms: |
| 54 | + - tags: |
| 55 | + karpenter.sh/discovery: "${CLUSTER_NAME}" |
| 56 | + - id: subnet-09fa4a0a8f233a921 |
| 57 | + securityGroupSelectorTerms: |
| 58 | + - tags: |
| 59 | + karpenter.sh/discovery: "${CLUSTER_NAME}" |
| 60 | + - name: my-security-group |
| 61 | + - id: sg-063d7acfb4b06c82c |
| 62 | + amiFamily: AL2023 |
| 63 | + amiSelectorTerms: |
| 64 | + - alias: al2023@v20240625 |
| 65 | + - tags: |
| 66 | + karpenter.sh/discovery: "${CLUSTER_NAME}" |
| 67 | + - name: my-ami |
| 68 | + - id: ami-123 |
| 69 | + role: "KarpenterNodeRole-${CLUSTER_NAME}" |
| 70 | + instanceProfile: "KarpenterNodeInstanceProfile-${CLUSTER_NAME}" |
| 71 | + userData: | |
| 72 | + echo "Hello world" |
| 73 | + tags: |
| 74 | + team: team-a |
| 75 | + app: team-a-app |
| 76 | + instanceStorePolicy: RAID0 |
| 77 | + metadataOptions: |
| 78 | + httpEndpoint: enabled |
| 79 | + httpProtocolIPv6: disabled |
| 80 | + httpPutResponseHopLimit: 1 # This is changed to disable IMDS access from containers not on the host network |
| 81 | + httpTokens: required |
| 82 | + blockDeviceMappings: |
| 83 | + - deviceName: /dev/xvda |
| 84 | + ebs: |
| 85 | + volumeSize: 100Gi |
| 86 | + volumeType: gp3 |
| 87 | + iops: 10000 |
| 88 | + encrypted: true |
| 89 | + kmsKeyID: "1234abcd-12ab-34cd-56ef-1234567890ab" |
| 90 | + deleteOnTermination: true |
| 91 | + throughput: 125 |
| 92 | + snapshotID: snap-0123456789 |
| 93 | + detailedMonitoring: **true** |
| 94 | +status: |
| 95 | + subnets: |
| 96 | + - id: subnet-0a462d98193ff9fac |
| 97 | + zone: us-east-2b |
| 98 | + - id: subnet-0322dfafd76a609b6 |
| 99 | + zone: us-east-2c |
| 100 | + - id: subnet-0727ef01daf4ac9fe |
| 101 | + zone: us-east-2b |
| 102 | + - id: subnet-00c99aeafe2a70304 |
| 103 | + zone: us-east-2a |
| 104 | + - id: subnet-023b232fd5eb0028e |
| 105 | + zone: us-east-2c |
| 106 | + - id: subnet-03941e7ad6afeaa72 |
| 107 | + zone: us-east-2a |
| 108 | + securityGroups: |
| 109 | + - id: sg-041513b454818610b |
| 110 | + name: ClusterSharedNodeSecurityGroup |
| 111 | + - id: sg-0286715698b894bca |
| 112 | + name: ControlPlaneSecurityGroup-1AQ073TSAAPW |
| 113 | + amis: |
| 114 | + - id: ami-01234567890123456 |
| 115 | + name: custom-ami-amd64 |
| 116 | + requirements: |
| 117 | + - key: kubernetes.io/arch |
| 118 | + operator: In |
| 119 | + values: |
| 120 | + - amd64 |
| 121 | + - id: ami-01234567890123456 |
| 122 | + name: custom-ami-arm64 |
| 123 | + requirements: |
| 124 | + - key: kubernetes.io/arch |
| 125 | + operator: In |
| 126 | + values: |
| 127 | + - arm64 |
| 128 | + instanceProfile: "${CLUSTER_NAME}-0123456778901234567789" |
| 129 | + conditions: |
| 130 | + - lastTransitionTime: "2024-02-02T19:54:34Z" |
| 131 | + status: "True" |
| 132 | + type: InstanceProfileReady |
| 133 | + - lastTransitionTime: "2024-02-02T19:54:34Z" |
| 134 | + status: "True" |
| 135 | + type: SubnetsReady |
| 136 | + - lastTransitionTime: "2024-02-02T19:54:34Z" |
| 137 | + status: "True" |
| 138 | + type: SecurityGroupsReady |
| 139 | + - lastTransitionTime: "2024-02-02T19:54:34Z" |
| 140 | + status: "True" |
| 141 | + type: AMIsReady |
| 142 | + - lastTransitionTime: "2024-02-02T19:54:34Z" |
| 143 | + status: "True" |
| 144 | + type: Ready |
| 145 | +``` |
| 146 | + |
| 147 | +### Printer Columns |
| 148 | + |
| 149 | +**Category:** Stability, Breaking |
| 150 | + |
| 151 | +#### Current |
| 152 | + |
| 153 | +``` |
| 154 | +➜ karpenter git:(main) ✗ k get ec2nodeclasses -o wide |
| 155 | +NAME AGE |
| 156 | +default 2d8h |
| 157 | +``` |
| 158 | + |
| 159 | +#### Proposed |
| 160 | + |
| 161 | +``` |
| 162 | +➜ karpenter git:(main) ✗ k get ec2nodeclasses -o wide |
| 163 | +NAME READY AGE ROLE |
| 164 | +default True 2d8h KarpenterNodeRole-test-cluster |
| 165 | +``` |
| 166 | + |
| 167 | +**Standard Columns** |
| 168 | + |
| 169 | +1. Name |
| 170 | +3. Ready - EC2NodeClasses now have status conditions that inform the user whether the EC2NodeClass has resolved all of its data and is “ready” to be used by a NodePool. This readiness should be easily viewable by users. |
| 171 | +4. Age |
| 172 | + |
| 173 | +**Wide Columns (-o wide)** |
| 174 | + |
| 175 | +1. Role - As a best practice, we are recommending that users use a Node role and let Karpenter create a managed instance profile on behalf of the customer. We should easily expose this role. |
| 176 | + |
| 177 | +#### Status Conditions |
| 178 | + |
| 179 | +**Category:** Stability |
| 180 | + |
| 181 | +Defining the complete set of status condition types that we will include on v1 launch is **out of scope** of this document and will be defined with more granularly in Karpenter’s Observability design. Minimally for v1, we will add a `Ready` condition so that we can determine whether a EC2NodeClass can be used by a NodePool during scheduling. More robustly, we will define status conditions that ensure that each required “concept” that’s needed for an instance launch is resolved e.g. InstanceProfile resolved, Subnet resolved, Security Groups resolved, etc. |
| 182 | + |
| 183 | +#### Require AMISelectorTerms |
| 184 | + |
| 185 | +**Category:** Stability, Breaking |
| 186 | + |
| 187 | +When specifying AMIFamily with no AMISelectorTerms, users are currently configured to automatically update AMIs when a new version of the EKS-optimized image in that family is released. Existing nodes on older versions of the AMI will drift to the newer version to meet the desired state of the EC2NodeClass. |
| 188 | + |
| 189 | +This works well in pre-prod environments where it’s nice to get auto-upgraded to the latest version for testing but is extremely risky in production environments. [Karpenter now recommends to users to pin AMIs in their production environments](https://karpenter.sh/docs/tasks/managing-amis/#option-1-manage-how-amis-are-tested-and-rolled-out:~:text=The%20safest%20way%2C%20and%20the%20one%20we%20recommend%2C%20for%20ensuring%20that%20a%20new%20AMI%20doesn%E2%80%99t%20break%20your%20workloads%20is%20to%20test%20it%20before%20putting%20it%20into%20production); however, it’s still possible to be caught by surprise today that Karpenter has this behavior when you deploy a EC2NodeClass and NodePool with an AMIFamily. Most notably, this is different from eksctl and MNG, where they will get the latest AMI when you first deploy the node group, but will pin it at the point that you add it. |
| 190 | + |
| 191 | +We no longer want to deal with potential confusion around whether nodes will get rolled or not when using an AMIFamily with no `amiSelectorTerms`. Instead, `amiSelectorTerms` will now be required and a new term type, `alias`, will be introduced which allows users to select an EKS optimized AMI. Each alias consists of an AMI family and a version. Users can set the version to `latest` to continue to get automatic upgrades, or pin to a specific version. |
| 192 | + |
| 193 | +#### Disable IMDS Access from Containers by Default |
| 194 | + |
| 195 | +**Category:** Stability, Breaking |
| 196 | + |
| 197 | +The HTTPPutResponseHopLimit is [part of the instance metadata settings that are configured on the node on startup](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html). This setting dictates how many hops a PUT request can take before it will be rejected by IMDS. For Kubernetes pods that live in another network namespace, this means that any pod that isn’t using `hostNetwork: true` [would need to have a HopLimit of 2 set in order to access IMDS](https://aws.amazon.com/about-aws/whats-new/2020/08/amazon-eks-supports-ec2-instance-metadata-service-v2/#:~:text=However%2C%20this%20limit%20is%20incompatible%20with%20containerized%20applications%20on%20Kubernetes%20that%20run%20in%20a%20separate%20network%20namespace%20from%20the%20instance). Opening up the node for pods to reach out to IMDS is an inherent security risk. If you are able to grab a token for IMDS, you can craft a request that gives the pod the same level of access as the instance profile which orchestrates the kubelet calls on the cluster. |
| 198 | + |
| 199 | +We should constrain our pods to not have access to IMDS by default to not open up users to this security risk. This new default wouldn’t affect users who have already deployed EC2NodeClasses on their cluster. It would only affect new EC2NodeClasses. |
| 200 | + |
| 201 | +## Labels/Annotations/Tags |
| 202 | + |
| 203 | +#### karpenter.sh/managed-by (EC2 Instance Tag) |
| 204 | + |
| 205 | +**Category:** Planned Deprecations, Breaking |
| 206 | + |
| 207 | +Karpenter introduced the `karpenter.sh/managed-by` tag in v0.28.0 when migrating Karpenter over to NodeClaims (called Machines at the time). This migration was marked as “completed” when it tagged the instance in EC2 with the `karpenter.sh/managed-by` tag and stored the cluster name as the value. Since we have completed the NodeClaim migration, we no longer have a need for this tag; so, we can drop it. |
| 208 | + |
| 209 | +This tag was only useful for scoping pod identity policies with ABAC, since it stored the cluster name in the value rather than `kubernetes.io/cluster/<cluser-name>` which stores the cluster name in the tag key. Session tags don’t work with tag keys, so we need some tag that we can recommend users to use to create pod identity policies with ABAC using OSS Karpenter. |
| 210 | + |
| 211 | +Starting in v1, Karpenter would use `eks:eks-cluster-name: <cluster-name>` for tagging and scoping instances, volumes, primary ENIs, etc. and would use `eks:eks-cluster-arn: <cluster-arn>` for tagging and scoping instance profiles that it creates. |
0 commit comments