Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium GatewayAPI Argo Rollout implementation #18

Merged
merged 3 commits into from
Aug 10, 2023

Conversation

xtineskim
Copy link
Contributor

Adding instructions for adding the CIlium Gateway API Argo Rollouts

@xtineskim xtineskim changed the title Cilium gatewayapi Cilium GatewayAPI Argo Rollout implementation Aug 8, 2023
@jay-jain
Copy link

jay-jain commented Aug 8, 2023

Would it be possible for Isovalent or someone upstream to do a full end-to-end (local) example incorporating the following tools/concepts :

I tried and couldn't get it to work. Thanks!

@kostis-codefresh
Copy link
Collaborator

kostis-codefresh commented Aug 9, 2023

@xtineskim Many thanks for this contribution. Could you please fix DCO as well? See some instructions here https://github.com/src-d/guide/blob/master/developer-community/fix-DCO.md

@jay-jain did you try the example from the PR you mean? Was there a specific error message? I think it would be great if you can provide more info to @xtineskim for debugging purposes...

@jay-jain
Copy link

jay-jain commented Aug 9, 2023

@kostis-codefresh
Sorry, for some reason I did not see @xtineskim 's newest commits with the instructions so I hadn't given it a go when I first wrote the comment. I think my comment is more asking for doc regarding how to do the setup on this point here:

Similar to Ingress, Gateway API controller creates a service of LoadBalancer type, so your environment will need to support this (for example MetalLB if running locally).

I have tried using MetalLB (described here) and the new Cilium L2 Announcement functionality released in Cilium 1.14 to no avail (I'll try and post my notes later today)

Regardless, I have gone ahead and tried what is proposed in the PR (using kind) and got stuck pretty early on.

@xtineskim's PR method

Step 1: Setup Kind Cluster

Kind Configuration File

---
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: sm
nodes:
    - role: control-plane
    - role: worker
    - role: worker
    - role: worker

Create the cluster:

kind create cluster --config kind.yaml

Step 2: Install Gateway API

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v0.7.0/config/crd/standard/gateway.networking.k8s.io_gatewayclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v0.7.0/config/crd/standard/gateway.networking.k8s.io_gateways.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v0.7.0/config/crd/standard/gateway.networking.k8s.io_httproutes.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v0.7.0/config/crd/standard/gateway.networking.k8s.io_referencegrants.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v0.7.0/config/crd/experimental/gateway.networking.k8s.io_tlsroutes.yaml

Step 3: Install Cilium

This is where I fail; when installing Cilium (via Helm chart) with kubeProxyReplacement=true:

helm upgrade \
        --install cilium cilium/cilium \
        --version 1.14.0 \
        --namespace kube-system \
        --reuse-values \
        --set kubeProxyReplacement="true" \
        --set gatewayAPI.enabled=true
Release "cilium" does not exist. Installing it now.
Error: template: cilium/templates/cilium-agent/daemonset.yaml:637:43: executing "cilium/templates/cilium-agent/daemonset.yaml" at <ne $kubeProxyReplacement "strict">: error calling ne: incompatible types for comparison

I think something is wrong with the Helm chart here: https://github.com/cilium/cilium/blob/v1.14.0/install/kubernetes/cilium/templates/cilium-agent/daemonset.yaml#L637

I could be wrong, but I think the $kubeProxyReplacement template variable is being typed as a boolean so the ne operator cannot properly compare the string "strict" with the boolean true. What's odd is that this happens even when I specifically pass "true" as a string in the helm command

Either way, I'd rather not de-rail this PR since this seems to be an external Cilium Helm chart issue; so feel free to proceed.

Thank you both for your contributions to the community; this is a very helpful project and a great starting point!

Signed-off-by: Christine Kim <[email protected]>
Signed-off-by: Christine Kim <[email protected]>
@jay-jain
Copy link

jay-jain commented Aug 9, 2023

Okay, I was able to resolve the Helm chart error by using --set-string instead of --set. 🤦‍♂️

I was able to get through the instructions documented in this PR, however no luck in accessing the Gateway.
Here are the steps to re-produce:

Step 1: Install kind

Create kind configuration (without default CNI)

---
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: sm
nodes:
    - role: control-plane
    - role: worker
    - role: worker
    - role: worker
networking:
    disableDefaultCNI: true

Create the cluster:

kind create cluster --config kind.yaml

Step 2: Install Calico CNI to get nodes into ready state

If we don't have a CNI we can't install Cilium, so we use Calico as a temporary one:

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
curl -sL https://docs.projectcalico.org/manifests/calico.yaml -O
kubectl apply -f calico.yaml

Step 3: Install Gateway API

GW_API_VERSION="v0.7.1"
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/${GW_API_VERSION}/standard-install.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/${GW_API_VERSION}/config/crd/experimental/gateway.networking.k8s.io_tlsroutes.yaml

Step 4: Install Cilium Helm Chart with these flags

if ! helm repo list | grep -q cilium; then
	helm repo add cilium https://helm.cilium.io
fi

helm upgrade \
	--install cilium cilium/cilium \
	--version 1.14.0 \
	--namespace kube-system \
	--reuse-values \
	--set-string kubeProxyReplacement=true \
	--set gatewayAPI.enabled=true \
	--set ipam.mode=kubernetes \
	--set externalIPs.enabled=true

Wait till everything is up and ready. You can check status and various configs with:

cilium status
cilium config view | grep "enable-gateway-api"
cilium config view | grep "enable-l7-proxy"

Step 5: Uninstall Calico

kubectl delete -f calico.yaml

Step 6: Install and Configure MetalLB

Instructions taken from here

# Create the address pool
KIND_NET_CIDR=$(docker network inspect kind -f '{{(index .IPAM.Config 0).Subnet}}')
METALLB_IP_START=$(echo ${KIND_NET_CIDR} | sed "[email protected]/[email protected]@")
METALLB_IP_END=$(echo ${KIND_NET_CIDR} | sed "[email protected]/[email protected]@")
METALLB_IP_RANGE="${METALLB_IP_START}-${METALLB_IP_END}"

cat << EOF > metallb_values.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
    name: default
    namespace: metallb-system
spec:
    addresses:
        - ${METALLB_IP_RANGE}
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
    name: default
    namespace: metallb-system
spec:
    ipAddressPools:
        - default
    nodeSelectors:
        - matchLabels:
              kubernetes.io/os: linux
EOF

# Install metallb 
helm install \
	--namespace metallb-system \
	--create-namespace \
	--repo https://metallb.github.io/metallb metallb \
	metallb \
	--version 0.13.10

Wait until all pods are up in the metallb-system namespace and then add the IPAddressPool and L2Advertisement that we created in the previous step:

kubectl -n metallb-system create -f metallb_values.yaml

Step 7: Validate the Gateway Class exists

$ kubectl get gatewayclasses.gateway.networking.k8s.io
NAME     CONTROLLER                     ACCEPTED   AGE
cilium   io.cilium/gateway-controller   True       18m

Step 8: Deploy Argo Rollouts

# Install Argo Rollouts (server)
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

Step 9: Deploy the Gateway API Plugin

cat << EOF > rollouts-plugin-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: argo-rollouts-config # must be so name
  namespace: argo-rollouts # must be in this namespace
data:
  trafficRouterPlugins: |-
    - name: "argoproj-labs/gatewayAPI"
      location: "https://github.com/argoproj-labs/rollouts-plugin-trafficrouter-gatewayapi/releases/download/v0.0.0-rc1/gateway-api-plugin-linux-amd64"
EOF
kubectl apply -f rollouts-plugin-cm.yaml

Step 10: Deploy the RBAC components

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: gateway-controller-role
  namespace: argo-rollouts
rules:
  - apiGroups:
      - "*"
    resources:
      - "*"
    verbs:
      - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: gateway-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: gateway-controller-role
subjects:
  - namespace: argo-rollouts
    kind: ServiceAccount
    name: argo-rollouts

Step 11: Restart Argo Rollouts Deployment

To force the CM plugin gets loaded

kubectl -n argo-rollouts rollout restart deployment/argo-rollouts

Step 12: Deploy the Gateway, HTTPRoute, Services, and Rollout

Deploy the Gateway

kind: Gateway
apiVersion: gateway.networking.k8s.io/v1beta1
metadata:
  name: cilium
spec:
  gatewayClassName: cilium
  listeners:
    - name: http
      protocol: HTTP
      port: 80
      allowedRoutes:
        namespaces:
          from: All

Validate the gateway

$ k get gateway
NAME     CLASS    ADDRESS          PROGRAMMED   AGE
cilium   cilium   172.18.255.202   True         18m

$ kubectl get gateway cilium -o=jsonpath="{.status.addresses[0].value}"
172.18.255.202

Deploy HTTPRoute, Services, and Rollout

---
kind: HTTPRoute
apiVersion: gateway.networking.k8s.io/v1beta1
metadata:
    name: argo-rollouts-http-route
spec:
    parentRefs:
        - kind: Gateway
          name: cilium
    hostnames:
        - "demo.example.com"
    rules:
        - matches:
              - path:
                    type: PathPrefix
                    value: /
          backendRefs:
              - name: argo-rollouts-stable-service
                kind: Service
                port: 80
              - name: argo-rollouts-canary-service
                kind: Service
                port: 80
---
apiVersion: v1
kind: Service
metadata:
    name: argo-rollouts-canary-service
spec:
    ports:
        - port: 80
          targetPort: http
          protocol: TCP
          name: http
    selector:
        app: rollouts-demo
---
apiVersion: v1
kind: Service
metadata:
    name: argo-rollouts-stable-service
spec:
    ports:
        - port: 80
          targetPort: http
          protocol: TCP
          name: http
    selector:
        app: rollouts-demo
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
  namespace: default
spec:
  revisionHistoryLimit: 1
  replicas: 10
  strategy:
    canary:
      canaryService: argo-rollouts-canary-service # our created canary service
      stableService: argo-rollouts-stable-service # our created stable service
      trafficRouting:
        plugins:
          argoproj-labs/gatewayAPI:
            httpRoute: argo-rollouts-http-route # our created httproute
            namespace: default
      steps:
      - setWeight: 30
      - pause: {}
      - setWeight: 60
      - pause: {}
      - setWeight: 100
      - pause: {}
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollouts-demo
  template:
    metadata:
      labels:
        app: rollouts-demo
    spec:
      containers:
        - name: rollouts-demo
          image: kostiscodefresh/summer-of-k8s-app:v1
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          resources:
            requests:
              memory: 32Mi
              cpu: 5m

Step 13: Test

Get Gateway External IP

$ GATEWAY="$(kubectl get gateways.gateway.networking.k8s.io cilium -o=jsonpath="{.status.addresses[0].value}")"

Access the Gateway

$ curl -vvv -H "host: demo.example.com" ${GATEWAY}/call-me
*   Trying 172.18.255.202:80...
* connect to 172.18.255.202 port 80 failed: Connection timed out
* Failed to connect to 172.18.255.202 port 80: Connection timed out
* Closing connection 0
curl: (28) Failed to connect to 172.18.255.202 port 80: Connection timed out

@xtineskim
Copy link
Contributor Author

@jay-jain you may need to port forward in order to access the IP address of the gateway!

@jay-jain
Copy link

jay-jain commented Aug 9, 2023

@xtineskim Hmmm I wasn't able to get it to work with a port-forward. I think since it's an external IP it shouldn't be necessary. I found a Cilium issue here which sounds very similar to what I'm experiencing with bare-metal setups. I think once that upstream issue gets resolved, I should have this up and running.
Thanks for all your help and appreciate you getting the ball rolling on this example for those of us that are trying to do Cilium + Argo Rollouts! 💯

@xtineskim
Copy link
Contributor Author

@jay-jain depending on what your env looks like (personally, i use lima vm to run my kind cluster, and have to forward my port (not kubectl port-forward)), but that issue could be it... 🤔
thanks for the help and writing out the steps diligently in your comment - next steps for this to get merged is approval or were you wanting to wait for that upstream issue to be resolved?

@jay-jain
Copy link

jay-jain commented Aug 9, 2023

@xtineskim Ahh gotcha that makes sense. I have no objections to this getting merged now without that upstream issue being resolved, since that issue is pretty specific to bare-metal environments, but I'll leave it up to yourself and @kostis-codefresh . What you have here seems like a really good starting point. I can open a PR later down the road if/when the upstream issue gets resolved with some more specific instructions (if needed).

@kostis-codefresh kostis-codefresh merged commit 7da06c2 into argoproj-labs:main Aug 10, 2023
4 checks passed
@xtineskim xtineskim deleted the cilium-gatewayapi branch August 10, 2023 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants