Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cni-repair-controller to linkerd-cni DaemonSet #11699

Merged
merged 6 commits into from
Jan 5, 2024

Conversation

alpeb
Copy link
Member

@alpeb alpeb commented Dec 5, 2023

Followup to linkerd/linkerd2-proxy-init#306
Fixes #11073

This adds the cni-repair-controller container to the linkerd-cni DaemonSet, along with its config in values.yaml. Note that this is disabled by default; to enable set repairController.enabled=true.

Also the linkerd-cni's version is bumped, to contain the new binary for this controller.

Finally,** priorityClassName: system-cluster-critical was added to the DaemonSet, which should signal the scheduler to give it priority over application pods, but this has proven to not be reliable, thus the need of the new controller.

Followup to linkerd/linkerd2-proxy-init#306
Fixes #11073

This adds the `reinitialize-pods` container to the `linkerd-cni`
DaemonSet, along with its config in `values.yaml`.

Also the `linkerd-cni`'s version is bumped, to contain the new binary
for this controller.

## TO-DOs

- Integration test
@alpeb alpeb force-pushed the alpeb/linkerd-reinitialize-pods branch from 178c9f8 to 1ca137a Compare December 5, 2023 15:34
@alpeb alpeb force-pushed the alpeb/linkerd-reinitialize-pods branch from a38b17e to 0e943f4 Compare December 6, 2023 16:25
@jdinsel-xealth
Copy link

Could this also be marked with the Fixes #11735 tag?

@alpeb alpeb marked this pull request as draft December 11, 2023 19:53
alpeb added a commit to linkerd/linkerd2-proxy-init that referenced this pull request Dec 11, 2023
(Note this will fail until linkerd/linkerd2#11699 lands)

The `integration-cni-plugin.yml` workflow (formerly known as `cni-plugin-integration.yml`) has been expanded to run the new recipe `reinitialize-pods-integration`, which performs the following steps:

- Rebuilds the `linkerd-reinitialize-pods` crate and `cni-plugin`. The `Dockerfile-cni-plugin` file has been refactored to have two main targets `runtime` and `runtime-test`, the latter picking the `linkerd-reinitialize-pods` that has just been built locally.
- Creates a new cluster at version `v1.27.6-k3s1` (version required for Calico to work)
- Triggers a new `./reinitialize-pods/integration/run.sh` script which:
  - Installs Calico
  - Installs the latest linkerd-edge CLI
  - Installs `linkerd-cni` and wait for it to become ready
  - Install the linkerd control plane in CNI mode
  - Install a `pause` DaemonSet

The `linkerd-cni` instance has been configured to include an extra initContainer that will delay its start for 15s. Since we waited for it to become ready, this doesn't affect the initial install. But then a new node is added to the cluster, and this delay allows for the new `pause` DaemonSet replica to start before the full CNI config is ready, so we can observe its failure to come up. Once the new `linkerd-cni` replica becomes ready we observe how the `pause` failed replica is replaced by a new healthy one.
alpeb added a commit to linkerd/linkerd2-proxy-init that referenced this pull request Dec 11, 2023
(Note this will fail until linkerd/linkerd2#11699 lands)

The `integration-cni-plugin.yml` workflow (formerly known as `cni-plugin-integration.yml`) has been expanded to run the new recipe `reinitialize-pods-integration`, which performs the following steps:

- Rebuilds the `linkerd-reinitialize-pods` crate and `cni-plugin`. The `Dockerfile-cni-plugin` file has been refactored to have two main targets `runtime` and `runtime-test`, the latter picking the `linkerd-reinitialize-pods` that has just been built locally.
- Creates a new cluster at version `v1.27.6-k3s1` (version required for Calico to work)
- Triggers a new `./reinitialize-pods/integration/run.sh` script which:
  - Installs Calico
  - Installs the latest linkerd-edge CLI
  - Installs `linkerd-cni` and wait for it to become ready
  - Install the linkerd control plane in CNI mode
  - Install a `pause` DaemonSet

The `linkerd-cni` instance has been configured to include an extra initContainer that will delay its start for 15s. Since we waited for it to become ready, this doesn't affect the initial install. But then a new node is added to the cluster, and this delay allows for the new `pause` DaemonSet replica to start before the full CNI config is ready, so we can observe its failure to come up. Once the new `linkerd-cni` replica becomes ready we observe how the `pause` failed replica is replaced by a new healthy one.
alpeb added a commit to linkerd/linkerd2-proxy-init that referenced this pull request Dec 11, 2023
(Note this will fail until linkerd/linkerd2#11699 lands)

The `integration-cni-plugin.yml` workflow (formerly known as `cni-plugin-integration.yml`) has been expanded to run the new recipe `reinitialize-pods-integration`, which performs the following steps:

- Rebuilds the `linkerd-reinitialize-pods` crate and `cni-plugin`. The `Dockerfile-cni-plugin` file has been refactored to have two main targets `runtime` and `runtime-test`, the latter picking the `linkerd-reinitialize-pods` that has just been built locally.
- Creates a new cluster at version `v1.27.6-k3s1` (version required for Calico to work)
- Triggers a new `./reinitialize-pods/integration/run.sh` script which:
  - Installs Calico
  - Installs the latest linkerd-edge CLI
  - Installs `linkerd-cni` and wait for it to become ready
  - Install the linkerd control plane in CNI mode
  - Install a `pause` DaemonSet

The `linkerd-cni` instance has been configured to include an extra initContainer that will delay its start for 15s. Since we waited for it to become ready, this doesn't affect the initial install. But then a new node is added to the cluster, and this delay allows for the new `pause` DaemonSet replica to start before the full CNI config is ready, so we can observe its failure to come up. Once the new `linkerd-cni` replica becomes ready we observe how the `pause` failed replica is replaced by a new healthy one.
@alpeb
Copy link
Member Author

alpeb commented Dec 12, 2023

Could this also be marked with the Fixes #11735 tag?

I've replied to that issue. Given that's a Talos system, that appears to be hitting #7945.

@ramsateesh
Copy link

This is perfect. When can I test this ?

@alpeb
Copy link
Member Author

alpeb commented Dec 14, 2023

This is perfect. When can I test this ?

This should be shipped with an edge release in the coming weeks 🙂

alpeb added a commit to linkerd/linkerd2-proxy-init that referenced this pull request Jan 2, 2024
Fixes linkerd/linkerd2#11073

This fixes the issue of injected pods that cannot acquire proper network config because `linkerd-cni` and/or the cluster's network CNI haven't fully started. They are left in a permanent crash loop and once CNI is ready, they need to be restarted externally, which is what this controller does.

This controller "`linkerd-cni-repair-controller`" watches over events on pods in the current node, which have been injected but are in a terminated state and whose `linkerd-network-validator` container exited with code 95, and proceeds to delete them so they can restart with a proper network config.

The controller is to be deployed as an additional container in the `linkerd-cni` DaemonSet (addressed in linkerd/linkerd2#11699).

This exposes two custom counter metrics: `linkerd_cni_repair_controller_queue_overflow` (in the spirit of the destination controller's `endpoint_updates_queue_overflow`) and `linkerd_cni_repair_controller_deleted`
@alpeb alpeb changed the title Add reinitialize-pods controller to linkerd-cni DaemonSet Add cni-repair-controller to linkerd-cni DaemonSet Jan 2, 2024
alpeb added a commit to linkerd/linkerd2-proxy-init that referenced this pull request Jan 2, 2024
(Note this will fail until linkerd/linkerd2#11699 lands)

The `integration-cni-plugin.yml` workflow (formerly known as `cni-plugin-integration.yml`) has been expanded to run the new recipe `cni-repair-controller-integration`, which performs the following steps:

- Rebuilds the `linkerd-cni-repair-controller` crate and `cni-plugin`
- Creates a new cluster at version `v1.27.6-k3s1` (version required for Calico to work)
- Triggers a new `./cni-repair-controller/integration/run.sh` script which:
  - Installs Calico
  - Installs the latest linkerd-edge CLI
  - Installs `linkerd-cni` and wait for it to become ready
  - Install the linkerd control plane in CNI mode
  - Install a `pause` DaemonSet

The `linkerd-cni` instance has been configured to include an extra initContainer that will delay its start for 15s. Since we waited for it to become ready, this doesn't affect the initial install. But then a new node is added to the cluster, and this delay allows for the new `pause` DaemonSet replica to start before the full CNI config is ready, so we can observe its failure to come up. Once the new `linkerd-cni` replica becomes ready we observe how the `pause` failed replica is replaced by a new healthy one.
alpeb added a commit to linkerd/linkerd2-proxy-init that referenced this pull request Jan 2, 2024
(Note this will fail until linkerd/linkerd2#11699 lands)

The `integration-cni-plugin.yml` workflow (formerly known as `cni-plugin-integration.yml`) has been expanded to run the new recipe `cni-repair-controller-integration`, which performs the following steps:

- Rebuilds the `linkerd-cni-repair-controller` crate and `cni-plugin`
- Creates a new cluster at version `v1.27.6-k3s1` (version required for Calico to work)
- Triggers a new `./cni-repair-controller/integration/run.sh` script which:
  - Installs Calico
  - Installs the latest linkerd-edge CLI
  - Installs `linkerd-cni` and wait for it to become ready
  - Install the linkerd control plane in CNI mode
  - Install a `pause` DaemonSet

The `linkerd-cni` instance has been configured to include an extra initContainer that will delay its start for 15s. Since we waited for it to become ready, this doesn't affect the initial install. But then a new node is added to the cluster, and this delay allows for the new `pause` DaemonSet replica to start before the full CNI config is ready, so we can observe its failure to come up. Once the new `linkerd-cni` replica becomes ready we observe how the `pause` failed replica is replaced by a new healthy one.
alpeb added a commit to linkerd/linkerd2-proxy-init that referenced this pull request Jan 2, 2024
(Note this will fail until linkerd/linkerd2#11699 lands)

The `integration-cni-plugin.yml` workflow (formerly known as `cni-plugin-integration.yml`) has been expanded to run the new recipe `cni-repair-controller-integration`, which performs the following steps:

- Rebuilds the `linkerd-cni-repair-controller` crate and `cni-plugin`
- Creates a new cluster at version `v1.27.6-k3s1` (version required for Calico to work)
- Triggers a new `./cni-repair-controller/integration/run.sh` script which:
  - Installs Calico
  - Installs the latest linkerd-edge CLI
  - Installs `linkerd-cni` and wait for it to become ready
  - Install the linkerd control plane in CNI mode
  - Install a `pause` DaemonSet

The `linkerd-cni` instance has been configured to include an extra initContainer that will delay its start for 15s. Since we waited for it to become ready, this doesn't affect the initial install. But then a new node is added to the cluster, and this delay allows for the new `pause` DaemonSet replica to start before the full CNI config is ready, so we can observe its failure to come up. Once the new `linkerd-cni` replica becomes ready we observe how the `pause` failed replica is replaced by a new healthy one.
@alpeb alpeb marked this pull request as ready for review January 2, 2024 22:00
charts/linkerd2-cni/templates/cni-plugin.yaml Show resolved Hide resolved
# Defaults to system-cluster-critical so it signals the scheduler to start
# before application pods, but after CNI plugins (whose priorityClassName is
# system-node-critical). This isn't strictly enforced.
priorityClassName: "system-cluster-critical"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for the cni-repair controller specifically? If not, can you pull it out into a separate change?

If CNI plugins should run at system-node-critical, why wouldn't the Linkerd CNI run at system-node-critical? If we don't have that as a default now, is there a reason for that? I.e. are there any downsides to setting this as a default?

If we omit this change from this PR, this change feels less risky to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not required for this PR. The reasoning for using system-cluster-critical was to allow for the main CNI plugin in the cluster to run first, lessening the chance to run into the race condition the repair controller attempts to fix. But it appears these class names are either best-effort or the prioritization mechanism is simply not implemented as advertised, according to my testing. I'll remove this for now.

charts/linkerd2-cni/values.yaml Show resolved Hide resolved
charts/linkerd2-cni/values.yaml Outdated Show resolved Hide resolved
@olix0r olix0r merged commit 55d1049 into main Jan 5, 2024
35 checks passed
@olix0r olix0r deleted the alpeb/linkerd-reinitialize-pods branch January 5, 2024 17:28
mateiidavid added a commit that referenced this pull request Jan 12, 2024
This edge release introduces a number of different fixes and improvements. More
notably, it introduces a new `cni-repair-controller` binary to the CNI plugin
image. The controller will automatically restart pods that have not received
their iptables configuration.

* Removed shortnames from Tap API resources to avoid colliding with existing
  Kubernetes resources ([#11816]; fixes [#11784])
* Introduced a new ExternalWorkload CRD to support upcoming mesh expansion
  feature ([#11805])
* Changed `MeshTLSAuthentication` resource validation to allow SPIFFE URI
  identities ([#11882])
* Introduced a new `cni-repair-controller` to the `linkerd-cni` DaemonSet to
  automatically restart misconfigured pods that are missing iptables rules
  ([#11699]; fixes [#11073])
* Fixed a `"duplicate metrics"` warning in the multicluster service-mirror
  component ([#11875]; fixes [#11839])
* Added metric labels and weights to `linkerd diagnostics endpoints` json
  output ([#11889])
* Changed how `Server` updates are handled in the destination service. The
  change will ensure that during a cluster resync, consumers won't be
  overloaded by redundant updates ([#11907])
* Changed `linkerd install` error output to add a newline when a Kubernetes
  client cannot be successfully initialised

[#11816]: #11816
[#11784]: #11784
[#11805]: #11805
[#11882]: #11882
[#11699]: #11699
[#11073]: #11073
[#11875]: #11875
[#11839]: #11839
[#11889]: #11889
[#11907]: #11907
[#11917]: #11917

Signed-off-by: Matei David <[email protected]>
@mateiidavid mateiidavid mentioned this pull request Jan 12, 2024
mateiidavid added a commit that referenced this pull request Jan 12, 2024
This edge release introduces a number of different fixes and improvements. More
notably, it introduces a new `cni-repair-controller` binary to the CNI plugin
image. The controller will automatically restart pods that have not received
their iptables configuration.

* Removed shortnames from Tap API resources to avoid colliding with existing
  Kubernetes resources ([#11816]; fixes [#11784])
* Introduced a new ExternalWorkload CRD to support upcoming mesh expansion
  feature ([#11805])
* Changed `MeshTLSAuthentication` resource validation to allow SPIFFE URI
  identities ([#11882])
* Introduced a new `cni-repair-controller` to the `linkerd-cni` DaemonSet to
  automatically restart misconfigured pods that are missing iptables rules
  ([#11699]; fixes [#11073])
* Fixed a `"duplicate metrics"` warning in the multicluster service-mirror
  component ([#11875]; fixes [#11839])
* Added metric labels and weights to `linkerd diagnostics endpoints` json
  output ([#11889])
* Changed how `Server` updates are handled in the destination service. The
  change will ensure that during a cluster resync, consumers won't be
  overloaded by redundant updates ([#11907])
* Changed `linkerd install` error output to add a newline when a Kubernetes
  client cannot be successfully initialised ([#11917])

[#11816]: #11816
[#11784]: #11784
[#11805]: #11805
[#11882]: #11882
[#11699]: #11699
[#11073]: #11073
[#11875]: #11875
[#11839]: #11839
[#11889]: #11889
[#11907]: #11907
[#11917]: #11917

Signed-off-by: Matei David <[email protected]>
mateiidavid added a commit that referenced this pull request Jan 12, 2024
This edge release introduces a number of different fixes and improvements. More
notably, it introduces a new `cni-repair-controller` binary to the CNI plugin
image. The controller will automatically restart pods that have not received
their iptables configuration.

* Removed shortnames from Tap API resources to avoid colliding with existing
  Kubernetes resources ([#11816]; fixes [#11784])
* Introduced a new ExternalWorkload CRD to support upcoming mesh expansion
  feature ([#11805])
* Changed `MeshTLSAuthentication` resource validation to allow SPIFFE URI
  identities ([#11882])
* Introduced a new `cni-repair-controller` to the `linkerd-cni` DaemonSet to
  automatically restart misconfigured pods that are missing iptables rules
  ([#11699]; fixes [#11073])
* Fixed a `"duplicate metrics"` warning in the multicluster service-mirror
  component ([#11875]; fixes [#11839])
* Added metric labels and weights to `linkerd diagnostics endpoints` json
  output ([#11889])
* Changed how `Server` updates are handled in the destination service. The
  change will ensure that during a cluster resync, consumers won't be
  overloaded by redundant updates ([#11907])
* Changed `linkerd install` error output to add a newline when a Kubernetes
  client cannot be successfully initialised ([#11917])

[#11816]: #11816
[#11784]: #11784
[#11805]: #11805
[#11882]: #11882
[#11699]: #11699
[#11073]: #11073
[#11875]: #11875
[#11839]: #11839
[#11889]: #11889
[#11907]: #11907
[#11917]: #11917

Signed-off-by: Matei David <[email protected]>
adleong pushed a commit that referenced this pull request Jan 18, 2024
Followup to linkerd/linkerd2-proxy-init#306
Fixes #11073

This adds the `reinitialize-pods` container to the `linkerd-cni`
DaemonSet, along with its config in `values.yaml`.

Also the `linkerd-cni`'s version is bumped, to contain the new binary
for this controller.
@adleong adleong mentioned this pull request Jan 18, 2024
adleong added a commit that referenced this pull request Jan 19, 2024
This stable release adds a cni-repair-controller which fixes the issue of
injected pods that cannot acquire proper network config because linkerd-cni
and/or the cluster's network CNI haven't fully started ([#11699]). It also
fixes a bug in the destination controller where having a large number of
Server resources could cause the destination controller to use an excessive
amount of CPU ([#11907]). Finally, it fixes a conflict with tap resource
shortnames which was causing warnings from kubectl v1.29.0+ ([#11816]).

[#11699]: #11699
[#11907]: #11907
[#11816]: #11816
alpeb added a commit to linkerd/linkerd2-proxy-init that referenced this pull request Jan 22, 2024
(Note this will fail until linkerd/linkerd2#11699 lands)

The `integration-cni-plugin.yml` workflow (formerly known as `cni-plugin-integration.yml`) has been expanded to run the new recipe `cni-repair-controller-integration`, which performs the following steps:

- Rebuilds the `linkerd-cni-repair-controller` crate and `cni-plugin`
- Creates a new cluster at version `v1.27.6-k3s1` (version required for Calico to work)
- Triggers a new `./cni-repair-controller/integration/run.sh` script which:
  - Installs Calico
  - Installs the latest linkerd-edge CLI
  - Installs `linkerd-cni` and wait for it to become ready
  - Install the linkerd control plane in CNI mode
  - Install a `pause` DaemonSet

The `linkerd-cni` instance has been configured to include an extra initContainer that will delay its start for 15s. Since we waited for it to become ready, this doesn't affect the initial install. But then a new node is added to the cluster, and this delay allows for the new `pause` DaemonSet replica to start before the full CNI config is ready, so we can observe its failure to come up. Once the new `linkerd-cni` replica becomes ready we observe how the `pause` failed replica is replaced by a new healthy one.
@nathanmcgarvey-modopayments

Does this further the fix for #8120 as well? This seems like it was #2 in that issue's proposed fixes.

@alpeb
Copy link
Member Author

alpeb commented Jan 24, 2024

Does this further the fix for #8120 as well? This seems like it was #2 in that issue's proposed fixes.

That is correct 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When linkerd-network-validator catches missing iptables config, pod is left in a failure state
5 participants