Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latency/timeout from Kube DNS #640

Open
lbernick opened this issue Jul 15, 2024 · 1 comment
Open

Latency/timeout from Kube DNS #640

lbernick opened this issue Jul 15, 2024 · 1 comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@lbernick
Copy link

lbernick commented Jul 15, 2024

Likely a duplicate of #96, but I'm opening a new issue as requested here since that one is old/possibly no longer relevant.

On 6/6/24 I observed DNS resolution for a storage service in our cluster taking multiple seconds to complete and am now observing it again; here are my notes from that issue.

First, I made a request from within our app container to the public hostname of the service:

root@app-55f755fc8c-88gvj:/app# time curl --location 'https://<hostname>/api/v1/ping'
pong

real	0m5.196s
user	0m0.020s
sys	0m0.006s

I then tried the cluster internal hostname:

/app # time curl --location '<service>.<namespace>.svc.cluster.local:443/api/v1/ping'
pong
real	0m 7.52s
user	0m 0.00s
sys	0m 0.00s

Making a request to the IP address was very fast:

/app # time curl --location '<IP>:443/api/v1/ping'
pong
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s

I believe this was an issue with kube DNS timing out for the following reasons:

  • I updated the app deployment to use dnsPolicy: Default instead of dnsPolicy: ClusterFirst as an experiment. After doing this, I saw that making a request to the public hostname of this service from within the app container was now fast, and my understanding is that this is because the requests were skipping kube DNS and going through a different resolver (?) based on these docs.
  • Following the DNS debugging guide, I observed the following unexpected logs in the kube-dns pods:
➜  ~ k logs kube-dns-f65b59b6b-v72bv -n kube-system
Defaulted container "kubedns" out of: kubedns, dnsmasq, sidecar, prometheus-to-sd
I0604 05:28:55.606443       1 flags.go:57] FLAG: --add-dir-header="false"
I0604 05:28:55.606517       1 flags.go:57] FLAG: --alsologtostderr="false"
I0604 05:28:55.606520       1 flags.go:57] FLAG: --config-dir="/kube-dns-config"
I0604 05:28:55.606523       1 flags.go:57] FLAG: --config-map=""
I0604 05:28:55.606526       1 flags.go:57] FLAG: --config-map-namespace="kube-system"
I0604 05:28:55.606528       1 flags.go:57] FLAG: --config-period="10s"
I0604 05:28:55.606531       1 flags.go:57] FLAG: --dns-bind-address="0.0.0.0"
I0604 05:28:55.606533       1 flags.go:57] FLAG: --dns-port="10053"
I0604 05:28:55.606536       1 flags.go:57] FLAG: --domain="cluster.local."
I0604 05:28:55.606539       1 flags.go:57] FLAG: --federations=""
I0604 05:28:55.606542       1 flags.go:57] FLAG: --healthz-port="8081"
I0604 05:28:55.606544       1 flags.go:57] FLAG: --initial-sync-timeout="1m0s"
I0604 05:28:55.606546       1 flags.go:57] FLAG: --kube-master-url=""
I0604 05:28:55.606548       1 flags.go:57] FLAG: --kubecfg-file=""
I0604 05:28:55.606550       1 flags.go:57] FLAG: --log-backtrace-at=":0"
I0604 05:28:55.606556       1 flags.go:57] FLAG: --log-dir=""
I0604 05:28:55.606559       1 flags.go:57] FLAG: --log-file=""
I0604 05:28:55.606564       1 flags.go:57] FLAG: --log-file-max-size="1800"
I0604 05:28:55.606568       1 flags.go:57] FLAG: --log-flush-frequency="5s"
I0604 05:28:55.606571       1 flags.go:57] FLAG: --logtostderr="true"
I0604 05:28:55.606574       1 flags.go:57] FLAG: --nameservers=""
I0604 05:28:55.606577       1 flags.go:57] FLAG: --one-output="false"
I0604 05:28:55.606580       1 flags.go:57] FLAG: --profiling="false"
I0604 05:28:55.606585       1 flags.go:57] FLAG: --skip-headers="false"
I0604 05:28:55.606588       1 flags.go:57] FLAG: --skip-log-headers="false"
I0604 05:28:55.606591       1 flags.go:57] FLAG: --stderrthreshold="2"
I0604 05:28:55.606595       1 flags.go:57] FLAG: --v="2"
I0604 05:28:55.606598       1 flags.go:57] FLAG: --version="false"
I0604 05:28:55.606602       1 flags.go:57] FLAG: --vmodule=""
I0604 05:28:55.606622       1 dns.go:49] version: 1.22.28-gke.3
I0604 05:28:55.633328       1 server.go:73] Using configuration read from directory: /kube-dns-config with period 10s
I0604 05:28:55.633360       1 server.go:126] FLAG: --add-dir-header="false"
I0604 05:28:55.633365       1 server.go:126] FLAG: --alsologtostderr="false"
I0604 05:28:55.633372       1 server.go:126] FLAG: --config-dir="/kube-dns-config"
I0604 05:28:55.633376       1 server.go:126] FLAG: --config-map=""
I0604 05:28:55.633380       1 server.go:126] FLAG: --config-map-namespace="kube-system"
I0604 05:28:55.633383       1 server.go:126] FLAG: --config-period="10s"
I0604 05:28:55.633386       1 server.go:126] FLAG: --dns-bind-address="0.0.0.0"
I0604 05:28:55.633389       1 server.go:126] FLAG: --dns-port="10053"
I0604 05:28:55.633393       1 server.go:126] FLAG: --domain="cluster.local."
I0604 05:28:55.633396       1 server.go:126] FLAG: --federations=""
I0604 05:28:55.633400       1 server.go:126] FLAG: --healthz-port="8081"
I0604 05:28:55.633403       1 server.go:126] FLAG: --initial-sync-timeout="1m0s"
I0604 05:28:55.633406       1 server.go:126] FLAG: --kube-master-url=""
I0604 05:28:55.633409       1 server.go:126] FLAG: --kubecfg-file=""
I0604 05:28:55.633412       1 server.go:126] FLAG: --log-backtrace-at=":0"
I0604 05:28:55.633418       1 server.go:126] FLAG: --log-dir=""
I0604 05:28:55.633423       1 server.go:126] FLAG: --log-file=""
I0604 05:28:55.633426       1 server.go:126] FLAG: --log-file-max-size="1800"
I0604 05:28:55.633429       1 server.go:126] FLAG: --log-flush-frequency="5s"
I0604 05:28:55.633433       1 server.go:126] FLAG: --logtostderr="true"
I0604 05:28:55.633436       1 server.go:126] FLAG: --nameservers=""
I0604 05:28:55.633439       1 server.go:126] FLAG: --one-output="false"
I0604 05:28:55.633442       1 server.go:126] FLAG: --profiling="false"
I0604 05:28:55.633446       1 server.go:126] FLAG: --skip-headers="false"
I0604 05:28:55.633449       1 server.go:126] FLAG: --skip-log-headers="false"
I0604 05:28:55.633452       1 server.go:126] FLAG: --stderrthreshold="2"
I0604 05:28:55.633456       1 server.go:126] FLAG: --v="2"
I0604 05:28:55.633459       1 server.go:126] FLAG: --version="false"
I0604 05:28:55.633462       1 server.go:126] FLAG: --vmodule=""
I0604 05:28:55.633597       1 server.go:182] Starting SkyDNS server (0.0.0.0:10053)
I0604 05:28:55.633713       1 server.go:194] Skydns metrics enabled (/metrics:10055)
I0604 05:28:55.633725       1 dns.go:190] Starting endpointsController
I0604 05:28:55.633728       1 dns.go:193] Starting serviceController
I0604 05:28:55.633880       1 log.go:245] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0604 05:28:55.633892       1 log.go:245] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0604 05:28:55.633965       1 dns.go:186] Configuration updated: {TypeMeta:{Kind: APIVersion:} Federations:map[] StubDomains:map[] UpstreamNameservers:[]}
I0604 05:28:56.134228       1 dns.go:224] Initialized services and endpoints from apiserver
I0604 05:28:56.134245       1 server.go:150] Setting up Healthz Handler (/readiness)
I0604 05:28:56.134250       1 server.go:155] Setting up cache handler (/cache)
I0604 05:28:56.134257       1 server.go:136] Status HTTP port 8081
I0604 07:15:25.930259       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I0604 07:15:25.930268       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I0604 07:23:27.725583       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I0604 07:23:27.725592       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
  • When I deleted the kube-dns pods and allowed the deployment to recreate them, the problem seemed to resolve, at least for the time being.

Unfortunately, I'm not sure how to reproduce this issue. This similar issue suggests this error may occur when a deployment disconnects from the kube API server: cert-manager/cert-manager#4685 (comment)

I didn't observe any restarts for the kube-dns pods:

➜  ~ k get po -n kube-system
NAME                                                       READY   STATUS    RESTARTS       AGE
...
kube-dns-f65b59b6b-bkqw9                                   4/4     Running   0              31h
kube-dns-f65b59b6b-v72bv                                   4/4     Running   0              2d10h

k8s version:

➜  ~ k version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.10-gke.1075001
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants