Latency/timeout from Kube DNS #640

lbernick · 2024-07-15T20:14:47Z

Likely a duplicate of #96, but I'm opening a new issue as requested here since that one is old/possibly no longer relevant.

On 6/6/24 I observed DNS resolution for a storage service in our cluster taking multiple seconds to complete and am now observing it again; here are my notes from that issue.

First, I made a request from within our app container to the public hostname of the service:

root@app-55f755fc8c-88gvj:/app# time curl --location 'https://<hostname>/api/v1/ping'
pong

real	0m5.196s
user	0m0.020s
sys	0m0.006s

I then tried the cluster internal hostname:

/app # time curl --location '<service>.<namespace>.svc.cluster.local:443/api/v1/ping'
pong
real	0m 7.52s
user	0m 0.00s
sys	0m 0.00s

Making a request to the IP address was very fast:

/app # time curl --location '<IP>:443/api/v1/ping'
pong
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s

I believe this was an issue with kube DNS timing out for the following reasons:

I updated the app deployment to use dnsPolicy: Default instead of dnsPolicy: ClusterFirst as an experiment. After doing this, I saw that making a request to the public hostname of this service from within the app container was now fast, and my understanding is that this is because the requests were skipping kube DNS and going through a different resolver (?) based on these docs.
Following the DNS debugging guide, I observed the following unexpected logs in the kube-dns pods:

➜  ~ k logs kube-dns-f65b59b6b-v72bv -n kube-system
Defaulted container "kubedns" out of: kubedns, dnsmasq, sidecar, prometheus-to-sd
I0604 05:28:55.606443       1 flags.go:57] FLAG: --add-dir-header="false"
I0604 05:28:55.606517       1 flags.go:57] FLAG: --alsologtostderr="false"
I0604 05:28:55.606520       1 flags.go:57] FLAG: --config-dir="/kube-dns-config"
I0604 05:28:55.606523       1 flags.go:57] FLAG: --config-map=""
I0604 05:28:55.606526       1 flags.go:57] FLAG: --config-map-namespace="kube-system"
I0604 05:28:55.606528       1 flags.go:57] FLAG: --config-period="10s"
I0604 05:28:55.606531       1 flags.go:57] FLAG: --dns-bind-address="0.0.0.0"
I0604 05:28:55.606533       1 flags.go:57] FLAG: --dns-port="10053"
I0604 05:28:55.606536       1 flags.go:57] FLAG: --domain="cluster.local."
I0604 05:28:55.606539       1 flags.go:57] FLAG: --federations=""
I0604 05:28:55.606542       1 flags.go:57] FLAG: --healthz-port="8081"
I0604 05:28:55.606544       1 flags.go:57] FLAG: --initial-sync-timeout="1m0s"
I0604 05:28:55.606546       1 flags.go:57] FLAG: --kube-master-url=""
I0604 05:28:55.606548       1 flags.go:57] FLAG: --kubecfg-file=""
I0604 05:28:55.606550       1 flags.go:57] FLAG: --log-backtrace-at=":0"
I0604 05:28:55.606556       1 flags.go:57] FLAG: --log-dir=""
I0604 05:28:55.606559       1 flags.go:57] FLAG: --log-file=""
I0604 05:28:55.606564       1 flags.go:57] FLAG: --log-file-max-size="1800"
I0604 05:28:55.606568       1 flags.go:57] FLAG: --log-flush-frequency="5s"
I0604 05:28:55.606571       1 flags.go:57] FLAG: --logtostderr="true"
I0604 05:28:55.606574       1 flags.go:57] FLAG: --nameservers=""
I0604 05:28:55.606577       1 flags.go:57] FLAG: --one-output="false"
I0604 05:28:55.606580       1 flags.go:57] FLAG: --profiling="false"
I0604 05:28:55.606585       1 flags.go:57] FLAG: --skip-headers="false"
I0604 05:28:55.606588       1 flags.go:57] FLAG: --skip-log-headers="false"
I0604 05:28:55.606591       1 flags.go:57] FLAG: --stderrthreshold="2"
I0604 05:28:55.606595       1 flags.go:57] FLAG: --v="2"
I0604 05:28:55.606598       1 flags.go:57] FLAG: --version="false"
I0604 05:28:55.606602       1 flags.go:57] FLAG: --vmodule=""
I0604 05:28:55.606622       1 dns.go:49] version: 1.22.28-gke.3
I0604 05:28:55.633328       1 server.go:73] Using configuration read from directory: /kube-dns-config with period 10s
I0604 05:28:55.633360       1 server.go:126] FLAG: --add-dir-header="false"
I0604 05:28:55.633365       1 server.go:126] FLAG: --alsologtostderr="false"
I0604 05:28:55.633372       1 server.go:126] FLAG: --config-dir="/kube-dns-config"
I0604 05:28:55.633376       1 server.go:126] FLAG: --config-map=""
I0604 05:28:55.633380       1 server.go:126] FLAG: --config-map-namespace="kube-system"
I0604 05:28:55.633383       1 server.go:126] FLAG: --config-period="10s"
I0604 05:28:55.633386       1 server.go:126] FLAG: --dns-bind-address="0.0.0.0"
I0604 05:28:55.633389       1 server.go:126] FLAG: --dns-port="10053"
I0604 05:28:55.633393       1 server.go:126] FLAG: --domain="cluster.local."
I0604 05:28:55.633396       1 server.go:126] FLAG: --federations=""
I0604 05:28:55.633400       1 server.go:126] FLAG: --healthz-port="8081"
I0604 05:28:55.633403       1 server.go:126] FLAG: --initial-sync-timeout="1m0s"
I0604 05:28:55.633406       1 server.go:126] FLAG: --kube-master-url=""
I0604 05:28:55.633409       1 server.go:126] FLAG: --kubecfg-file=""
I0604 05:28:55.633412       1 server.go:126] FLAG: --log-backtrace-at=":0"
I0604 05:28:55.633418       1 server.go:126] FLAG: --log-dir=""
I0604 05:28:55.633423       1 server.go:126] FLAG: --log-file=""
I0604 05:28:55.633426       1 server.go:126] FLAG: --log-file-max-size="1800"
I0604 05:28:55.633429       1 server.go:126] FLAG: --log-flush-frequency="5s"
I0604 05:28:55.633433       1 server.go:126] FLAG: --logtostderr="true"
I0604 05:28:55.633436       1 server.go:126] FLAG: --nameservers=""
I0604 05:28:55.633439       1 server.go:126] FLAG: --one-output="false"
I0604 05:28:55.633442       1 server.go:126] FLAG: --profiling="false"
I0604 05:28:55.633446       1 server.go:126] FLAG: --skip-headers="false"
I0604 05:28:55.633449       1 server.go:126] FLAG: --skip-log-headers="false"
I0604 05:28:55.633452       1 server.go:126] FLAG: --stderrthreshold="2"
I0604 05:28:55.633456       1 server.go:126] FLAG: --v="2"
I0604 05:28:55.633459       1 server.go:126] FLAG: --version="false"
I0604 05:28:55.633462       1 server.go:126] FLAG: --vmodule=""
I0604 05:28:55.633597       1 server.go:182] Starting SkyDNS server (0.0.0.0:10053)
I0604 05:28:55.633713       1 server.go:194] Skydns metrics enabled (/metrics:10055)
I0604 05:28:55.633725       1 dns.go:190] Starting endpointsController
I0604 05:28:55.633728       1 dns.go:193] Starting serviceController
I0604 05:28:55.633880       1 log.go:245] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0604 05:28:55.633892       1 log.go:245] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0604 05:28:55.633965       1 dns.go:186] Configuration updated: {TypeMeta:{Kind: APIVersion:} Federations:map[] StubDomains:map[] UpstreamNameservers:[]}
I0604 05:28:56.134228       1 dns.go:224] Initialized services and endpoints from apiserver
I0604 05:28:56.134245       1 server.go:150] Setting up Healthz Handler (/readiness)
I0604 05:28:56.134250       1 server.go:155] Setting up cache handler (/cache)
I0604 05:28:56.134257       1 server.go:136] Status HTTP port 8081
I0604 07:15:25.930259       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I0604 07:15:25.930268       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I0604 07:23:27.725583       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I0604 07:23:27.725592       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF

When I deleted the kube-dns pods and allowed the deployment to recreate them, the problem seemed to resolve, at least for the time being.

Unfortunately, I'm not sure how to reproduce this issue. This similar issue suggests this error may occur when a deployment disconnects from the kube API server: cert-manager/cert-manager#4685 (comment)

I didn't observe any restarts for the kube-dns pods:

➜  ~ k get po -n kube-system
NAME                                                       READY   STATUS    RESTARTS       AGE
...
kube-dns-f65b59b6b-bkqw9                                   4/4     Running   0              31h
kube-dns-f65b59b6b-v72bv                                   4/4     Running   0              2d10h

k8s version:

➜  ~ k version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.10-gke.1075001

The text was updated successfully, but these errors were encountered:

k8s-triage-robot · 2024-10-13T21:40:24Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency/timeout from Kube DNS #640

Latency/timeout from Kube DNS #640

lbernick commented Jul 15, 2024 •

edited

Loading

k8s-triage-robot commented Oct 13, 2024

Latency/timeout from Kube DNS #640

Latency/timeout from Kube DNS #640

Comments

lbernick commented Jul 15, 2024 • edited Loading

k8s-triage-robot commented Oct 13, 2024

lbernick commented Jul 15, 2024 •

edited

Loading