-
Notifications
You must be signed in to change notification settings - Fork 0
Kubernetes Implementation
Maintainer: @lwander
Note - this is meant to serve as a technical guide to the Kubernetes implementation. A more general walkthrough can (soon) be found on spinnaker.io.
The provider specification is as follows:
kubernetes:
enabled: # boolean indicating whether or not to use kubernetes as a provider
accounts: # list of kubernetes accounts
- name: # required unique name for this account
kubeconfigFile: # optional location of the kube config file
namespaces: # optional list of namespaces to manage
user: # optional user to authenticate as that must exist in the provided kube config file
cluster: # optional cluster to connect to that must exist in the provided kube config file
dockerRegistries: # required (at least 1) docker registry accounts used as a source of images
- accountName: # required name of the docker registry account
namespaces: # optional list of namespaces this docker registry can deploy to
Authentication is handled by the Clouddriver microservice, and was introduced in clouddriver/pull#214, and refined in clouddriver/pull#335.
The Kubernetes provider authenticates with any valid Kubernetes cluster using details found in a provided kubeconfig file. By default, the kubeconfig file at ~/.kube/config
is used, unless the field kubeconfigFile
is specified. The user, cluster, and singleton namespace are derived from the current-context
field in the kubeconfig file, unless their respective fields are provided. If no namespace is found in either namespaces
or in the current-context
field of the kubeconfig file, then the value ["default"]
is used. Any namespaces that do not exist will be created.
The Docker Registry accounts referred to by the above configuration are also configured inside Clouddriver. The details of that implementation can be found here. The Docker authentication details (username, password, email, endpoint address), are read from each listed Docker Registry account, and configured as an image pull secret, implemented in clouddriver/pull#285. The namespaces
field of the dockerRegistry
subblock defaults to the full list of namespaces, and is used by the Kubernetes provider to determine which namespaces to register the image pull secrets with. Every created pod is given the full list of image pull secrets available to its containing namespace.
The Kubernetes provider will periodically (every 30 seconds) attempt to fetch every provided namespace to see if the cluster is still reachable.
Spinnaker Server Groups are Kubernetes Replication Controllers. This is a straightforward mapping since both represent sets of managed, identical, immutable computing resources. However, there are a few caveats:
-
Replication Controllers manage Pods, which unlike VMs, can house multiple container images with the promise that all images in a Pod will be collocated. Notice, the intent here is not to place all of your application's containers into a single pod, but to instead collocate containers that form a logical unit and benefit from sharing resources. Design patterns, and a more thorough explanation can be found here.
-
Each Pod is in charge of managing it's own health checks, as opposed to the typical Spinnaker pattern of having health checks performed by Load Balancers. The ability to add these to replication controllers was added in clouddriver/pull#359.
Below are the server group operations and their implementations.
-
Clouddriver component: clouddriver/pull#227.
-
Deck components:
- Ad-hoc creation deck/pull#1881.
- Pipeline deploy stage: deck/pull#2015.
- Pipeline find image stage: deck/pull#2025.
This operation creates a Replication Controller with the specified containers and their respective configurations.
-
Clouddriver component: clouddriver/pull#245.
-
Deck component: deck/pull#1950.
This operation takes a source Replication Controller as an argument, and creates it while overriding any attributes with the values provided in the request.
-
Clouddriver component: clouddriver/pull#361.
-
Deck components:
- Ad-hoc & pipeline stage: deck/pull#2058.
This stage takes a source Replication Controller, and a target size (can be 0), and attempts to set the given Replication Controller to that size.
-
Clouddriver component: clouddriver/pull#383.
-
Deck components:
- Ad-hoc & pipeline stage: deck/pull#2079.
These stages take a source Replication Controller and either enable or disable traffic to them through their associated Services. The way the association with Services is maintained is explained in more detail in the below Load Balancers section.
Since rollback (disabling a newer server group in favor of an older one) is built on top of the enable/disable primitives, rolling back was enabled simply by providing Orca (the orchestration engine) the names of the two server groups to operate on.
-
Clouddriver component: clouddriver/pull#423.
-
Deck components:
- Ad-hoc deletion: deck/pull#2101.
- Pipeline stage: deck/pull#2120.
Coming Q2 2016 - Will be implemented as Horizontal Pod Autoscalers.
Orca (the orchestration engine) decides whether or not a step has succeeded by checking the health states of the instances it is operating on. For example, a "disable" stage will not complete until all instances being disabled report their health as "OutOfService". Each instance reports its health as a set of objects reporting both their state (i.e. Healthy
, Down
, OutOfService
, etc...), as well as the source of that health state. The Spinnaker Kubernetes provider provides three sources of health for a single Pod, either KubernetesService
, KubernetesPod
, and KubernetesContainer
. The way each of these sources is determined as healthy is described in clouddriver/pull#446. Not all sources are used during each stage to determine success, instead, only the ones specified by Deck in interestingHealthProviderNames
are considered. For example: during a "disable" stage, the only relevant health metric is whether or not a Pod is registered with a given service, so KubernetesService
is provided as the interesting health provider. This is implemented in [deck/pull#2016].
In Spinnaker, Load Balancers are durable units of infrastructure used as the entry point to a set of instances. The Service resource serves a similar function in a Kubernetes cluster, in addition to providing extra features such as service discovery. For this reason, Kubernetes Services are Spinnaker Load Balancers.
Services forward traffic to any pods that have labels matching their label selector. More information on labels can be found here. Since Spinnaker allows an M:N
relationship between instances and load balancers, we roughly assign labels and selectors like so:
service:
name: service-a
selectors:
- load-balancer-service-a: true # bound to pod-x, pod-y
service:
name: service-b
selectors:
- load-balancer-service-b: true # bound to pod-x
pod:
name: pod-x
labels:
- load-balancer-service-a: true # bound to service-a
- load-balancer-service-b: true # bound to service-b
pod:
name: pod-y
labels:
- load-balancer-service-a: true # bound to service-a
pod:
name: pod-z
labels:
- load-balancer-service-b: false # bound to no services
In the above example, it is clear how an M:N
relationship between Services and Pods exists. Furthermore, pod-z
may not be serving traffic, but it can be re-enabled by changing the value of its first label to true
.
Below are the load balancer operations and their implementations.
-
Clouddriver component: clouddriver/pull#307.
-
Deck component: deck/pull#1986.
Upsert either creates, or updates an existing load balancer.
-
Clouddriver component: clouddriver/pull#424.
-
Deck component: deck/pull#2101.
Security groups are represented as Ingress resources. It is important to note, if you do not have an underlying ingress controller for you Kubernetes installation, you will have to write one.
-
Clouddriver component: clouddriver/pull#449.
-
Deck component: deck/pull#2114.
-
Clouddriver component: clouddriver/pull#466.
-
Deck component: deck/pull#2138.
A generic caching overview can be found here.
The initial caching work for Kubernetes was implemented here:
- Instances, server groups, applications, clusters: clouddriver/pull#276.
- Load balancers: clouddriver/pull#312.
Deck is in charge of presenting the data retrieved from the cache, and the relevant work can be found here:
- Instance details: deck/pull#1956.
- Server group details: deck/pull#1942.
- Load balancer details: deck/pull#1986.