Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle same hostname for multiple pods in a headless service #123

Closed
wants to merge 1 commit into from

Conversation

guillaumerose
Copy link

This code improves the handling of headless services pointing at pods sharing
the same hostname.

  • For each hostname, A records should point at the list of pods IPs.
    Currently, only the last pod gets its A record.
  • For the service, a A record should be created for each pod.

Fixes #116

cc @dgageot

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jul 7, 2017
@thockin
Copy link
Member

thockin commented Jul 7, 2017

I know the spec says it should return multiple A records in this case, but I just want to think it through. Is this the correct behavior? It's less surprising than dropping all-but-one, though I could argue that using the same hostname twice is a bug on the user's end.

So is the spec wrong or is the code wrong? If we implement this - what else might break? Are there apps that actually expect to use DNS as "phantom elimination"? I feel the current behavior is under-specified to rely on that, but the implementation is reasonably deterministic.

@bowei @johnbelamaric @smarterclayton @erictune @kubernetes/sig-apps-misc

@erictune
Copy link
Member

erictune commented Jul 7, 2017

@kow3ns

@erictune
Copy link
Member

erictune commented Jul 7, 2017

Using the same hostname is not a bug.

Spreading may be soft rather than hard.

Someone may want to bring up a size-3 etcd in their 2-node cluster as part of a test. They don't need HA, they just want to reuse the config. That should work as similarly as possible to the "normal" case where the pods are properly spread.

@thockin
Copy link
Member

thockin commented Jul 7, 2017

@erictune I am not clear on your reply. Let me provide an example and you tell me which behavior is considered correct.

T0: User somehow creates a set of pods that have hostnames "app-1", "app-2", "app-3". DNS lookups of "service" return 3 distinct A records. DNS lookups of "app-[123].service" each return one distinct A record.

T1: Some management event happens and, through cut-paste or similar, the app-3 pod is replaced with a new pod that has hostname "app-2". DNS lookups of "service" return 3 distinct A records. Question: should a DNS lookup of "app-2" return a single A record (one of the 2 pods, chosen somehow) or should it return 2 A records?

Today it returns 1 record.

@thockin
Copy link
Member

thockin commented Jul 7, 2017

Another note: if we decide the spec is wrong, and that we should only serve one A record, we have a bug. When you hit this overlap, the IP of the last pod created "wins". If that pod goes away, we do not update the PTR to the other pod with the same hostname.

@erictune
Copy link
Member

erictune commented Jul 7, 2017 via email

@dgageot
Copy link

dgageot commented Jul 10, 2017

Thanks @thockin for the feedback.

When we prepared this PR, we tried real hard to not break existing things and we considered this whole "phantom elimination" couldn't be considered a feature given what's in the spec. But I understand that nonetheless, it might be a breaking change.

What do you think we can do to mitigate that risk?

@johnbelamaric
Copy link
Member

@chrisohaver I believe in CoreDNS we include all IPs. This is consistent with ordinary DNS behavior - it is legal in DNS to have multiple A records with the same name and different IPs.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Aug 3, 2017
@guillaumerose
Copy link
Author

We fixed the CLA check. Any chance to get this merged ? Should we do something ?
Thanks !

@dgageot
Copy link

dgageot commented Aug 3, 2017

ping @thockin

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 20, 2017
@guillaumerose
Copy link
Author

Any update on this ?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants