Skip to content

Commit a5d5473

Browse files
docs: Add v1 RFC to the AWS CloudProvider repo (aws#6604)
1 parent 81e2628 commit a5d5473

File tree

2 files changed

+292
-0
lines changed

2 files changed

+292
-0
lines changed

designs/v1-api.md

+211
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
# Karpenter v1 API
2+
3+
_This RFC is an extension of the [v1 API RFC](https://github.com/kubernetes-sigs/karpenter/blob/main/designs/v1-api.md) that is merged in the [`kubernetes-sigs/karpenter` repo](https://github.com/kubernetes-sigs/karpenter)._
4+
5+
## Overview
6+
7+
Karpenter released the beta version of its APIs and features in October 2023. The intention behind this beta was that we would be able to determine the final set of changes and feature adds that we wanted to add to Karpenter before we considered Karpenter feature-complete. The list below details the features that Karpenter has on its roadmap before Karpenter becomes feature complete and stable at v1.
8+
9+
### Categorization
10+
11+
This list represents the minimal set of changes that are needed to ensure proper operational excellence, feature completeness, and stability by v1. For a change to make it on this list, it must meet one of the following criteria:
12+
13+
1. Breaking: The feature requires changes or removals from the API that would be considered breaking after a bump to v1
14+
2. Stability: The feature ensures proper operational excellence for behavior that is leaky or has race conditions in the beta state
15+
3. Planned Deprecations: The feature cleans-up deprecations that were previously planned the project
16+
17+
## EC2NodeClass API
18+
19+
```
20+
apiVersion: karpenter.k8s.aws/v1
21+
kind: EC2NodeClass
22+
metadata:
23+
name: default
24+
spec:
25+
kubelet:
26+
podsPerCore: 2
27+
maxPods: 20
28+
systemReserved:
29+
cpu: 100m
30+
memory: 100Mi
31+
ephemeral-storage: 1Gi
32+
kubeReserved:
33+
cpu: 200m
34+
memory: 100Mi
35+
ephemeral-storage: 3Gi
36+
evictionHard:
37+
memory.available: 5%
38+
nodefs.available: 10%
39+
nodefs.inodesFree: 10%
40+
evictionSoft:
41+
memory.available: 500Mi
42+
nodefs.available: 15%
43+
nodefs.inodesFree: 15%
44+
evictionSoftGracePeriod:
45+
memory.available: 1m
46+
nodefs.available: 1m30s
47+
nodefs.inodesFree: 2m
48+
evictionMaxPodGracePeriod: 60
49+
imageGCHighThresholdPercent: 85
50+
imageGCLowThresholdPercent: 80
51+
cpuCFSQuota: true
52+
clusterDNS: ["10.0.1.100"]
53+
subnetSelectorTerms:
54+
- tags:
55+
karpenter.sh/discovery: "${CLUSTER_NAME}"
56+
- id: subnet-09fa4a0a8f233a921
57+
securityGroupSelectorTerms:
58+
- tags:
59+
karpenter.sh/discovery: "${CLUSTER_NAME}"
60+
- name: my-security-group
61+
- id: sg-063d7acfb4b06c82c
62+
amiFamily: AL2023
63+
amiSelectorTerms:
64+
- alias: al2023@v20240625
65+
- tags:
66+
karpenter.sh/discovery: "${CLUSTER_NAME}"
67+
- name: my-ami
68+
- id: ami-123
69+
role: "KarpenterNodeRole-${CLUSTER_NAME}"
70+
instanceProfile: "KarpenterNodeInstanceProfile-${CLUSTER_NAME}"
71+
userData: |
72+
echo "Hello world"
73+
tags:
74+
team: team-a
75+
app: team-a-app
76+
instanceStorePolicy: RAID0
77+
metadataOptions:
78+
httpEndpoint: enabled
79+
httpProtocolIPv6: disabled
80+
httpPutResponseHopLimit: 1 # This is changed to disable IMDS access from containers not on the host network
81+
httpTokens: required
82+
blockDeviceMappings:
83+
- deviceName: /dev/xvda
84+
ebs:
85+
volumeSize: 100Gi
86+
volumeType: gp3
87+
iops: 10000
88+
encrypted: true
89+
kmsKeyID: "1234abcd-12ab-34cd-56ef-1234567890ab"
90+
deleteOnTermination: true
91+
throughput: 125
92+
snapshotID: snap-0123456789
93+
detailedMonitoring: **true**
94+
status:
95+
subnets:
96+
- id: subnet-0a462d98193ff9fac
97+
zone: us-east-2b
98+
- id: subnet-0322dfafd76a609b6
99+
zone: us-east-2c
100+
- id: subnet-0727ef01daf4ac9fe
101+
zone: us-east-2b
102+
- id: subnet-00c99aeafe2a70304
103+
zone: us-east-2a
104+
- id: subnet-023b232fd5eb0028e
105+
zone: us-east-2c
106+
- id: subnet-03941e7ad6afeaa72
107+
zone: us-east-2a
108+
securityGroups:
109+
- id: sg-041513b454818610b
110+
name: ClusterSharedNodeSecurityGroup
111+
- id: sg-0286715698b894bca
112+
name: ControlPlaneSecurityGroup-1AQ073TSAAPW
113+
amis:
114+
- id: ami-01234567890123456
115+
name: custom-ami-amd64
116+
requirements:
117+
- key: kubernetes.io/arch
118+
operator: In
119+
values:
120+
- amd64
121+
- id: ami-01234567890123456
122+
name: custom-ami-arm64
123+
requirements:
124+
- key: kubernetes.io/arch
125+
operator: In
126+
values:
127+
- arm64
128+
instanceProfile: "${CLUSTER_NAME}-0123456778901234567789"
129+
conditions:
130+
- lastTransitionTime: "2024-02-02T19:54:34Z"
131+
status: "True"
132+
type: InstanceProfileReady
133+
- lastTransitionTime: "2024-02-02T19:54:34Z"
134+
status: "True"
135+
type: SubnetsReady
136+
- lastTransitionTime: "2024-02-02T19:54:34Z"
137+
status: "True"
138+
type: SecurityGroupsReady
139+
- lastTransitionTime: "2024-02-02T19:54:34Z"
140+
status: "True"
141+
type: AMIsReady
142+
- lastTransitionTime: "2024-02-02T19:54:34Z"
143+
status: "True"
144+
type: Ready
145+
```
146+
147+
### Printer Columns
148+
149+
**Category:** Stability, Breaking
150+
151+
#### Current
152+
153+
```
154+
➜ karpenter git:(main) ✗ k get ec2nodeclasses -o wide
155+
NAME AGE
156+
default 2d8h
157+
```
158+
159+
#### Proposed
160+
161+
```
162+
➜ karpenter git:(main) ✗ k get ec2nodeclasses -o wide
163+
NAME READY AGE ROLE
164+
default True 2d8h KarpenterNodeRole-test-cluster
165+
```
166+
167+
**Standard Columns**
168+
169+
1. Name
170+
3. Ready - EC2NodeClasses now have status conditions that inform the user whether the EC2NodeClass has resolved all of its data and is “ready” to be used by a NodePool. This readiness should be easily viewable by users.
171+
4. Age
172+
173+
**Wide Columns (-o wide)**
174+
175+
1. Role - As a best practice, we are recommending that users use a Node role and let Karpenter create a managed instance profile on behalf of the customer. We should easily expose this role.
176+
177+
#### Status Conditions
178+
179+
**Category:** Stability
180+
181+
Defining the complete set of status condition types that we will include on v1 launch is **out of scope** of this document and will be defined with more granularly in Karpenter’s Observability design. Minimally for v1, we will add a `Ready` condition so that we can determine whether a EC2NodeClass can be used by a NodePool during scheduling. More robustly, we will define status conditions that ensure that each required “concept” that’s needed for an instance launch is resolved e.g. InstanceProfile resolved, Subnet resolved, Security Groups resolved, etc.
182+
183+
#### Require AMISelectorTerms
184+
185+
**Category:** Stability, Breaking
186+
187+
When specifying AMIFamily with no AMISelectorTerms, users are currently configured to automatically update AMIs when a new version of the EKS-optimized image in that family is released. Existing nodes on older versions of the AMI will drift to the newer version to meet the desired state of the EC2NodeClass.
188+
189+
This works well in pre-prod environments where it’s nice to get auto-upgraded to the latest version for testing but is extremely risky in production environments. [Karpenter now recommends to users to pin AMIs in their production environments](https://karpenter.sh/docs/tasks/managing-amis/#option-1-manage-how-amis-are-tested-and-rolled-out:~:text=The%20safest%20way%2C%20and%20the%20one%20we%20recommend%2C%20for%20ensuring%20that%20a%20new%20AMI%20doesn%E2%80%99t%20break%20your%20workloads%20is%20to%20test%20it%20before%20putting%20it%20into%20production); however, it’s still possible to be caught by surprise today that Karpenter has this behavior when you deploy a EC2NodeClass and NodePool with an AMIFamily. Most notably, this is different from eksctl and MNG, where they will get the latest AMI when you first deploy the node group, but will pin it at the point that you add it.
190+
191+
We no longer want to deal with potential confusion around whether nodes will get rolled or not when using an AMIFamily with no `amiSelectorTerms`. Instead, `amiSelectorTerms` will now be required and a new term type, `alias`, will be introduced which allows users to select an EKS optimized AMI. Each alias consists of an AMI family and a version. Users can set the version to `latest` to continue to get automatic upgrades, or pin to a specific version.
192+
193+
#### Disable IMDS Access from Containers by Default
194+
195+
**Category:** Stability, Breaking
196+
197+
The HTTPPutResponseHopLimit is [part of the instance metadata settings that are configured on the node on startup](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html). This setting dictates how many hops a PUT request can take before it will be rejected by IMDS. For Kubernetes pods that live in another network namespace, this means that any pod that isn’t using `hostNetwork: true` [would need to have a HopLimit of 2 set in order to access IMDS](https://aws.amazon.com/about-aws/whats-new/2020/08/amazon-eks-supports-ec2-instance-metadata-service-v2/#:~:text=However%2C%20this%20limit%20is%20incompatible%20with%20containerized%20applications%20on%20Kubernetes%20that%20run%20in%20a%20separate%20network%20namespace%20from%20the%20instance). Opening up the node for pods to reach out to IMDS is an inherent security risk. If you are able to grab a token for IMDS, you can craft a request that gives the pod the same level of access as the instance profile which orchestrates the kubelet calls on the cluster.
198+
199+
We should constrain our pods to not have access to IMDS by default to not open up users to this security risk. This new default wouldn’t affect users who have already deployed EC2NodeClasses on their cluster. It would only affect new EC2NodeClasses.
200+
201+
## Labels/Annotations/Tags
202+
203+
#### karpenter.sh/managed-by (EC2 Instance Tag)
204+
205+
**Category:** Planned Deprecations, Breaking
206+
207+
Karpenter introduced the `karpenter.sh/managed-by` tag in v0.28.0 when migrating Karpenter over to NodeClaims (called Machines at the time). This migration was marked as “completed” when it tagged the instance in EC2 with the `karpenter.sh/managed-by` tag and stored the cluster name as the value. Since we have completed the NodeClaim migration, we no longer have a need for this tag; so, we can drop it.
208+
209+
This tag was only useful for scoping pod identity policies with ABAC, since it stored the cluster name in the value rather than `kubernetes.io/cluster/<cluser-name>` which stores the cluster name in the tag key. Session tags don’t work with tag keys, so we need some tag that we can recommend users to use to create pod identity policies with ABAC using OSS Karpenter.
210+
211+
Starting in v1, Karpenter would use `eks:eks-cluster-name: <cluster-name>` for tagging and scoping instances, volumes, primary ENIs, etc. and would use `eks:eks-cluster-arn: <cluster-arn>` for tagging and scoping instance profiles that it creates.

designs/v1-roadmap.md

+81
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Karpenter v1 Roadmap
2+
3+
_This RFC is an extension of the [v1 Roadmap RFC](https://github.com/kubernetes-sigs/karpenter/blob/main/designs/v1-roadmap.md) that is merged in the [`kubernetes-sigs/karpenter` repo](https://github.com/kubernetes-sigs/karpenter)._
4+
5+
## Overview
6+
7+
Karpenter released the beta version of its APIs and features in October 2023. The intention behind this beta was that we would be able to determine the final set of changes and feature adds that we wanted to add to Karpenter before we considered Karpenter feature-complete. The list below details the features that Karpenter has on its roadmap before Karpenter becomes feature complete and stable at v1.
8+
9+
### Categorization
10+
11+
This list represents the minimal set of changes that are needed to ensure proper operational excellence, feature completeness, and stability by v1. For a change to make it on this list, it must meet one of the following criteria:
12+
13+
1. Breaking: The feature requires changes or removals from the API that would be considered breaking after a bump to v1
14+
2. Stability: The feature ensures proper operational excellence for behavior that is leaky or has race conditions in the beta state
15+
3. Planned Deprecations: The feature cleans-up deprecations that were previously planned the project
16+
17+
## Roadmap
18+
19+
1. [v1 APIs](./v1-api)
20+
2. [Removing Ubuntu AMIFamily](#removing-ubuntu-amifamily)
21+
3. [Change default TopologySpreadConstraint policy for Deployment from `ScheduleAnyways` to `DoNotSchedule`](#change-default-topologyspreadconstraint-policy-for-karpenter-deployment-from-scheduleanyways-to-donotschedule)
22+
4. [Removing Implicit ENI Public IP Configuration](#removing-implicit-eni-public-ip-configuration)
23+
24+
### v1 APIs
25+
26+
**Issue Ref(s):** https://github.com/kubernetes-sigs/karpenter/issues/758, https://github.com/aws/karpenter-provider-aws/issues/5006
27+
28+
**Category:** Breaking, Stability
29+
30+
For Karpenter to be considered v1, the CustomResources that are shipped with an installation of the project also need to be stable at v1. Changes to Karpenter’s API (including labels, annotations, and tags) in v1 are detailed in [Karpenter v1 API](./v1-api.md). The migration path for these changes will ensure that customers will not have to roll their nodes or manually convert their resources as they did at v1beta1. Instead, we will leverage Kubernetes [conversion webhooks](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#webhook-conversion) to automatically convert their resources to the new schema format in code. The API groups and Kind naming will remain unchanged.
31+
32+
### Removing Ubuntu AMIFamily
33+
34+
**Issue Ref(s):** https://github.com/aws/karpenter-provider-aws/issues/5572
35+
36+
**Category:** Breaking
37+
38+
Karpenter has supported the Ubuntu AMIFamily [since the v0.6.2 version of Karpenter](https://github.com/aws/karpenter-provider-aws/pull/1323). EKS does not have formal support for the Ubuntu AMIFamily for MNG or SMNG nodes (it's currently a third-party vendor AMI). As a result, there is no direct line-of-sight between changes in things like supported Kubernetes versions or kernel updates on the image.
39+
40+
Users who still want to use Ubuntu can still use a Custom AMIFamily with amiSelectorTerms pinned to the latest Ubuntu AMI ID. They can reference `bootstrapMode: AL2` to get the same userData configuration they received before.
41+
42+
#### Tasks
43+
44+
- [ ] Drop the Ubuntu AMIFamily from the set of enum values in the v1 CRD
45+
- [ ] Remove the Ubuntu bootstrapping logic from the Karpenter AMIFamily providers
46+
- [ ] Remove the Ubuntu-specific AMIFamily documentation in the karpenter.sh documentation
47+
48+
### Change default TopologySpreadConstraint policy for Deployment from `ScheduleAnyways` to `DoNotSchedule`
49+
50+
**Category:** Stability, Breaking
51+
52+
Karpenter ships by default with multiple replicas and leader election enabled to ensure that it can run in HA (High Availability) mode. This ensures that if a pod goes down due to an outage, the other pod is able to recover quickly by shifting the leader election over.
53+
54+
Karpenter currently uses the `ScheduleAnyways` zonal topologySpreadConstraint to spread its Karpenter deployment across zones. Because this is a preference, this doesn't guarantee that pods will end up in different zones, meaning that, if there is a zonal outage, multiple replicas won't increase resiliency.
55+
56+
```yaml
57+
topologySpreadConstraints:
58+
- labelSelector:
59+
matchLabels:
60+
app.kubernetes.io/instance: karpenter
61+
app.kubernetes.io/name: karpenter
62+
maxSkew: 1
63+
topologyKey: topology.kubernetes.io/zone
64+
whenUnsatisfiable: ScheduleAnyways
65+
```
66+
67+
As part of v1, we are changing our default from `ScheduleAnyways` to `DoNotSchedule` to enforce stronger best practices by default to ensure that Karpenter can recover quickly in the event of a zonal outage. Users who still want the old behavior can opt back into `ScheduleAnyways` by overriding the default TopologySpreadConstraint.
68+
69+
#### Tasks
70+
71+
- [ ] Update Karpenter's zonal topologySpreadConstraint from `whenUnsatisfiable: ScheduleAnyways` to `whenUnsatisfiable: DoNotSchedule`
72+
73+
### Removing Implicit ENI Public IP Configuration
74+
75+
**Category:** Planned Deprecations, Breaking
76+
77+
Karpenter currently supports checking the subnets that your instance request is attempting to launch into and explicitly configuring that `AssociatePublicIPAddress: false` when you are only launching into private subnets. This feature was supported because users had specifically requested for it in https://github.com/aws/karpenter-provider-aws/issues/3815, where users were writing deny policies on their EC2 instance launches through IRSA policies or SCP for instances that attempted to create network interfaces that associated an IP address. Now with https://github.com/aws/karpenter-provider-aws/pull/5437 merged, we have the ability to set the `associatePublicIPAddress` value explicitly on the EC2NodeClass. Users can directly set this value to `false` and we will no longer need to introspect the subnets when making instance launch requests.
78+
79+
#### Tasks
80+
81+
- [ ] Remove the [`CheckAnyPublicIPAssociations`](https://github.com/aws/karpenter-provider-aws/blob/ea8ea0ecb042f4143e2948d4e299e169671841fe/pkg/providers/subnet/subnet.go#L97) call in our launch template creation at v1

0 commit comments

Comments
 (0)