Skip to content

Conversation

hc-github-team-consul-core
Copy link
Collaborator

Backport

This PR is auto-generated from #4757 to be assessed for backporting due to the inclusion of the label backport/1.8.x.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@anandmukul93
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/consul-k8s/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.


Changes proposed in this PR

  • Race conditions handled for upgrade between new version of CNI pods
  • Race conditions handled for upgrade between old version of CNI pod and new version of CNI pod
  • Doesn't support backward compatibility
  • binWatcher for binary race conditions
  • Test cases for OS sigterms and other signals
  • Made watchers async
  • add versioning and installaiton id for pod deployments in kubeconfig and cni-host-token between restarts.
  • Add podUID for better tracking owner of artifacts. if not unix timestamp is applicable

How I've tested this PR

  • deployment upgrade from 1.7 to current
  • deployment upgrade from current to current (via other changes)
  • with and without autorotateToken
  • removes CNI token file on daemonset delete.

How I expect reviewers to test this PR

Old version to new version of consul-k8s control-plane update
Older versions dont remove the binary file so newer plane times out and proceeds.

2025-09-08T20:17:22.888Z [INFO]  Running CNI install with configuration: name=consul-cni type=consul-cni cni_bin_dir=/opt/cni/bin cni_net_dir=/etc/cni/net.d multus=false kubeconfig=ZZZ-consul-cni-kubeconfig-1757362642888465801 log_level=info cni_token_path:=/var/run/secrets/kubernetes.io/serviceaccount/token cni_host_token_path=/etc/cni/net.d/cni-host-token-1757362642888465801 autorotate_token:=true
2025-09-08T20:17:22.888Z [INFO]  Copying consul-cni binary: destination=/opt/cni/bin
2025-09-08T20:17:22.888Z [INFO]  Creating destBinWatcher for: file=/opt/cni/bin/consul-cni
**2025-09-08T20:19:02.888Z [INFO]  Grace period timeout reached, older pod may not have cleaned up binary, proceeding with copy**
2025-09-08T20:19:08.929Z [INFO]  Successfully copied binary after grace period timeout
2025-09-08T20:19:08.944Z [INFO]  Getting default config file from: destination=/etc/cni/net.d
2025-09-08T20:19:08.945Z [INFO]  Using config file: file=/etc/cni/net.d/15-azure.conflist
2025-09-08T20:19:08.945Z [INFO]  Installing plugin: reason="consul-cni config has changed"
2025-09-08T20:19:08.945Z [INFO]  Creating directory watcher for: directory=/etc/cni/net.d
2025-09-08T20:19:08.945Z [INFO]  Creating kubeconfig: file=ZZZ-consul-cni-kubeconfig-1757362642888465801
2025-09-08T20:19:08.945Z [INFO]  Creating sourceTokenWatcher for: file=/var/run/secrets/kubernetes.io/serviceaccount/token
2025-09-08T20:19:08.946Z [INFO]  Token file updated: file=/etc/cni/net.d/cni-host-token-1757362642888465801
2025-09-08T20:19:08.946Z [INFO]  Token file updated: file=/etc/cni/net.d/cni-host-token-1757362642888465801

New to New version of consul-k8s-control-plane -
Newer version would delete as a race condition so that is handled between upgrades

2025-09-08T20:27:32.185Z [INFO]  Running CNI install with configuration: name=consul-cni type=consul-cni cni_bin_dir=/opt/cni/bin cni_net_dir=/etc/cni/net.d multus=false kubeconfig=ZZZ-consul-cni-kubeconfig-1757363252185513038 log_level=info cni_token_path:=/var/run/secrets/kubernetes.io/serviceaccount/token cni_host_token_path=/etc/cni/net.d/cni-host-token-1757363252185513038 autorotate_token:=true
2025-09-08T20:27:32.185Z [INFO]  Copying consul-cni binary: destination=/opt/cni/bin
2025-09-08T20:27:32.185Z [INFO]  Creating destBinWatcher for: file=/opt/cni/bin/consul-cni
2025-09-08T20:27:32.611Z [INFO]  Received binary file event: event_type=CHMOD file=/opt/cni/bin/consul-cni
**2025-09-08T20:27:32.611Z [INFO]  Received binary file event: event_type=REMOVE file=/opt/cni/bin/consul-cni**
2025-09-08T20:27:33.297Z [INFO]  Successfully copied updated binary from source
2025-09-08T20:27:33.297Z [INFO]  Getting default config file from: destination=/etc/cni/net.d
2025-09-08T20:27:33.298Z [INFO]  Using config file: file=/etc/cni/net.d/15-azure.conflist
2025-09-08T20:27:33.298Z [INFO]  Installing plugin: reason="consul-cni config missing from config file"
2025-09-08T20:27:33.298Z [INFO]  Creating directory watcher for: directory=/etc/cni/net.d
2025-09-08T20:27:33.298Z [INFO]  Creating sourceTokenWatcher for: file=/var/run/secrets/kubernetes.io/serviceaccount/token
2025-09-08T20:27:33.298Z [INFO]  Creating kubeconfig: file=ZZZ-consul-cni-kubeconfig-1757363252185513038
2025-09-08T20:27:33.299Z [INFO]  Token file updated: file=/etc/cni/net.d/cni-host-token-1757363252185513038
2025-09-08T20:27:33.299Z [INFO]  Token file updated: file=/etc/cni/net.d/cni-host-token-1757363252185513038

Post disablling CNI -
No cni binary, kubeconfig, plugin-config and host-token if applicable remains on node

root@aks-nodepool1-14707519-vmss000003:/opt/cni/bin# ls
LICENSE    azure-vnet       azure-vnet-telemetry  bridge  dummy     host-device  ipvlan    macvlan  ptp  static  tuning  vrf
README.md  azure-vnet-ipam  bandwidth             dhcp    firewall  host-local   loopback  portmap  sbr  tap     vlan

root@aks-nodepool1-14707519-vmss000003:/etc/cni/net.d# cat 15-azure.conflist
{
  "cniVersion": "0.3.0",
  "name": "azure",
  "plugins": [
    {
      "ipam": {
        "type": "azure-vnet-ipam"
      },
      "ipsToRouteViaHost": [
        "169.254.20.10"
      ],
      "mode": "transparent",
      "type": "azure-vnet"
    },
    {
      "capabilities": {
        "portMappings": true
      },
      "snat": true,
      "type": "portmap"
    }
  ]
}

root@aks-nodepool1-14707519-vmss000003:/etc/cni/net.d# ls
15-azure.conflist

Checklist

PCI review checklist

  • I have documented a clear reason for, and description of, the change I am making.

  • If applicable, I've documented a plan to revert these changes if they require more than reverting the pull request.

  • If applicable, I've documented the impact of any changes to security controls.

    Examples of changes to security controls include using new access control methods, adding or removing logging pipelines, etc.


Overview of commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants