Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure network-validator and repair-controller to work with IPv6 #12874

Merged
merged 3 commits into from
Jul 24, 2024

Conversation

alpeb
Copy link
Member

@alpeb alpeb commented Jul 22, 2024

Fixes #12864

The linkerd-cni network-validator container was binding to the IPv4 wildcard and connecting to an IPv4 address. This wasn't breaking things in IPv6 clusters but it was only validating the iptables rules and not the ip6tables ones. This change introduces logic to use addresses according to the value of disableIPv6. If IPv6 is enabled, then the ip6tables rules would get exercised. Note that a more complete change would also exercise both iptables and ip6tables, but for now we're defaulting to ip6tables.

Similarly was the case with repair-controller, but given the IPv4 wildcard was used for the admin server, in IPv6 clusters the kubelet wasn't able to reach the probe endpoints and the container was failing. In this case the fix is just have the admin server bind to [::], which works for IPv4 and IPv6 clusters.

@@ -24,9 +24,13 @@ args:
- --log-level
- {{ .Values.networkValidator.logLevel }}
- --connect-addr
- {{ .Values.networkValidator.connectAddr }}
{{- if .Values.disableIPv6 }}
- "1.1.1.1:{{ .Values.networkValidator.connectPort }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only problem with this is that users can no longer modify the target now, right? This would probably change the behaviour of the validator. In air gapped environment, for example, if 1.1.1.1 is not accessible, the connection will fail.

Can users still determine whether the failure was caused due to missing iptables rules as opposed to properties of their network? The validator once connected expects a random string to be returned from the server. Obviously with any other host that wouldn't happen, so if the connection succeeded but the actual payload was wrong, we deduce iptables is incorrectly set-up.

Do you think this is a problem? Chances are, the error will be different any way, but I worry that might remove from the strictness of the check.

Copy link

@lwj5 lwj5 Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the first point, maybe it is possible to have connectHost: "" in the values.yaml. Which will default to 1.1.1.1 or [fd00::1] if empty and depending on disableIPv6. Otherwise, the address specified would be used. That should provide the flexibility required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are great points. @mateiidavid and @lwj5. I'll add the additional config 👍

Fixes #12864

The linkerd-cni network-validator container was binding to the IPv4
wildcard and connecting to an IPv4 address. This wasn't breaking things
in IPv6 clusters but it was only validating the iptables rules and not
the ip6tables ones. This change introduces logic to use addresses
according to the value of `disableIPv6`. If IPv6 is enabled, then the
ip6tables rules would get exercised. Note that a more complete change
would also exercise both iptables and ip6tables, but for now we're
defaulting to ip6tables.

This implied changing the helm value `networkValidator.connectAddr` to
`connectPort`. @mateiidavid could you please validate if this entry with
its simplified doc still makes sense, in light of #12797 ?

Similarly was the case with repair-controller, but given the IPv4
wildcard was used for the admin server, in IPv6 clusters the kubelet
wasn't able to reach the probe endpoints and the container was failing.
In this case the fix is just have the admin server bind to `[::]`, which
works for IPv4 and IPv6 clusters.
@alpeb
Copy link
Member Author

alpeb commented Jul 23, 2024

Ok I ended up putting back the connectAddr config (and not adding connectPort), but defaulting it to empty, to allow the IP family conditionals to operate. If set, it'll just be used. LMKWYT!

Copy link
Member

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change @alpeb! LGTM

# for public clusters and a private IP for air-gapped clusters with a port like 20001.
connectAddr: "1.1.1.1:20001"
# If empty, defaults to 1.1.1.1:20001 and [fd00::1]:20001 for IPv4 and IPv6 respectively.
connectAddr:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we just use "" (in Go it'll be initialised to an empty string anyway)? we use it in some of our config fields, e.g. injectCaFromSecret. You have a better feel than I do for Helm & config surface area in charts than I do though, so I'm ok with whatever decision you make :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I'll change it to an explicit empty string for consistency, and to not give the impression this could hold something else besides a string.

@alpeb alpeb merged commit 6603409 into main Jul 24, 2024
41 checks passed
@alpeb alpeb deleted the alpeb/cni-ipv6-wildcard branch July 24, 2024 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linkerd CNI repair controller does not listen on IPv6
3 participants