Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs update of the proposed HA solution of node local dns in PVS mode of kube-proxy #323

Open
carterzhao opened this issue Sep 26, 2019 · 9 comments
Labels
kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@carterzhao
Copy link

carterzhao commented Sep 26, 2019

The proposed HA solution of node local dns has been given here. But It will not work in IPVS mode of kube-proxy. I want to know whether there are some plans to support the HA solution in IPVS mode of kube-proxy.
Looking forward to your reply!!

@carterzhao carterzhao changed the title node cache the proposed HA solution of node local in PVS mode of kube-proxy Sep 26, 2019
@carterzhao carterzhao changed the title the proposed HA solution of node local in PVS mode of kube-proxy the proposed HA solution of node local dns in PVS mode of kube-proxy Sep 26, 2019
@bowei
Copy link
Member

bowei commented Sep 26, 2019

Any ideas would be welcome :-)

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2019
@bowei
Copy link
Member

bowei commented Dec 26, 2019

/lifecycle-remove stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 26, 2019
@prameshj
Copy link
Contributor

prameshj commented Apr 1, 2021

For IPVS mode, the HA solution is to run 2 replicas as described in the KEP:

Running 2 daemonsets of node-local-dns using the same listenIP - 169.254.20.10 via SO_REUSEPORT option. Upgrades will be done one daemonset at a time.

I see this got removed from the KEP when merging 2 KEPs together in kubernetes/enhancements#2487

@bowei
Copy link
Member

bowei commented Apr 2, 2021

Pavithra -- can you send a PR to add back to the KEP?

@bowei
Copy link
Member

bowei commented Apr 2, 2021

Also -- it probably needs to have documentation if this is a recommended setup...

@prameshj
Copy link
Contributor

prameshj commented Apr 2, 2021

Pavithra -- can you send a PR to add back to the KEP?

Yes, already created - kubernetes/enhancements#2592

Also -- it probably needs to have documentation if this is a recommended setup...

I agree we need better documentation of this. It isn't necessarily the recommended setup, since it is twice the resources and requires managing conflicts - only one replica should handle interface/iptables management.
I will use this issue to track documentation.

@prameshj prameshj changed the title the proposed HA solution of node local dns in PVS mode of kube-proxy Docs update of the proposed HA solution of node local dns in PVS mode of kube-proxy Apr 15, 2021
@prameshj prameshj added the kind/documentation Categorizes issue or PR as related to documentation. label Apr 15, 2021
@mlowery
Copy link

mlowery commented May 5, 2021

@prameshj: Regarding your comment:

requires managing conflicts - only one replica should handle interface/iptables management

Are there known issues where the two replicas will step on each other? If so, can you point me to them? When I look at the code, it seems idempotent and thread-safe. Is it not sufficient to simply pass --skipteardown to allow the replicas to safely coexist?

@prameshj
Copy link
Contributor

prameshj commented Nov 5, 2021

@prameshj: Regarding your comment:

requires managing conflicts - only one replica should handle interface/iptables management

Are there known issues where the two replicas will step on each other? If so, can you point me to them? When I look at the code, it seems idempotent and thread-safe. Is it not sufficient to simply pass --skipteardown to allow the replicas to safely coexist?

I missed this comment.. apologies for the super late reply. You could pass --skipteardown to both replicas, but then the iptables rules and nodelocaldns interface would need to be torn down. This is applicable mostly for cases where nodelocaldns is being disabled. It is probably not a huge issue in IPVS mode(where only a link local IP is used) if cleanup is skipped, since no other service uses that same link local IP. So, even if iptables rules take it nowhere, nothing will break, if pods have already switched to using kube-dns service IP.

For upgrades, since only one replica upgrades at a time, --skipteardown on both replicas should be ok.

However, if kube-dns service VIP is reused for nodelocaldns(in order to cleanly fallback to kubedns when nodelocaldns is down/disabled), then skipping cleanup will blackhole DNS traffic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

6 participants