Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

killing envoy process #859

Open
ben-swit opened this issue Oct 12, 2023 · 3 comments
Open

killing envoy process #859

ben-swit opened this issue Oct 12, 2023 · 3 comments

Comments

@ben-swit
Copy link

hi we using esp v2 in gke
and image like

Image:         gcr.io/endpoints-release/endpoints-runtime:2
Image ID:      gcr.io/endpoints-release/endpoints- 
runtime@sha256:1ffd99d09d89722cfbc9157eeec171a5566009c9fa36f081bbe8a462e9457225

sometime i saw health check fail like

  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  10m (x17 over 16d)    kubelet  Readiness probe failed: Get "http://10.32.11.246:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  10m (x16 over 7d20h)  kubelet  Liveness probe failed: Get "http://10.32.11.246:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    10m                   kubelet  Container esp failed liveness probe, will be restarted
  Normal   Started    10m (x2 over 16d)     kubelet  Started container esp
  Normal   Pulled     10m (x2 over 16d)     kubelet  Container image "gcr.io/endpoints-release/endpoints-runtime:2" already present on machine
  Normal   Created    10m (x2 over 16d)     kubelet  Created container esp
  Warning  Unhealthy  10m (x2 over 16d)     kubelet  Liveness probe failed: Get "http://10.32.11.246:8080/healthz": dial tcp 10.32.11.246:8080: connect: connection refused
  Warning  Unhealthy  10m (x3 over 16d)     kubelet  Readiness probe failed: Get "http://10.32.11.246:8080/healthz": dial tcp 10.32.11.246:8080: connect: connection refused

then pod killed so i check esp log like

 API failed with: 503 and body: upstream connect error or disconnect/reset before headers. reset reason: connection failure
2023-10-12 05:34:44,076: WARNING: got signal: SIGTERM
2023-10-12 05:34:44,673: INFO: sending TERM to PID=8
2023-10-12 05:34:44,674: INFO: sending TERM to PID=53
W1012 05:34:44.674 53 external/envoy/source/server/server.cc:854] [53][main]caught ENVOY_SIGTERM
I1012 05:34:44.674 53 external/envoy/source/server/server.cc:985] [53][main]shutting down server instance
I1012 05:34:44.674 53 external/envoy/source/server/server.cc:920] [53][main]main dispatch loop exited
W1012 05:34:44.674965       8 server.go:74] Server got signal terminated, stopping
E1012 05:34:45.174 70 src/envoy/http/service_control/client_cache.cc:161] [70][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.275 73 src/envoy/http/service_control/client_cache.cc:161] [73][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.377 81 src/envoy/http/service_control/client_cache.cc:161] [81][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.473 85 src/envoy/http/service_control/client_cache.cc:161] [85][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.478 86 src/envoy/http/service_control/client_cache.cc:161] [86][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.673 92 src/envoy/http/service_control/client_cache.cc:161] [92][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.776 101 src/envoy/http/service_control/client_cache.cc:161] [101][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.778 102 src/envoy/http/service_control/client_cache.cc:161] [102][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:46.181 121 src/envoy/http/service_control/client_cache.cc:161] [121][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:46.276 128 src/envoy/http/service_control/client_cache.cc:161] [128][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
W1012 05:34:46.280 53 external/envoy/source/common/config/grpc_stream.h:201] [53][config]StreamAggregatedResources gRPC config stream to @espv2-ads-cluster closed since 1455705s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
I1012 05:34:46.280 53 external/envoy/source/server/server.cc:972] [53][main]exiting
2023-10-12 05:34:50,681: INFO: ===waitpid: pid=8: doesn't exit
2023-10-12 05:34:50,681: CRITICAL: Config Manager is down, killing envoy process.
2023-10-12 05:34:50,681: INFO: Killing process: pid=53
2023-10-12 05:34:50,681: ERROR: The child process: pid=53 may not exist.

so i want to know reason of killed

how to check is it related esp??

@ben-swit
Copy link
Author

we find same log #849

스크린샷 2023-10-12 오후 2 52 23

@TAOXUY
Copy link
Collaborator

TAOXUY commented Oct 12, 2023

  1. I saw the servicecontrol service is 503/504, service side problem
  2. the ESPv2 container was killed by health check failure timeout

My guess since 1), ESPv2/Envoy is busy with retrying report calls, causing 2).

Which region your deployment is? We can help check with the servicecontrol service.

@ben-swit
Copy link
Author

ben-swit commented Oct 15, 2023

@TAOXUY we use kubernetes that pod on us-west1-a,b,c
You mentioned that there is an application deployed in a sidecar format with ESP. From your assumption, it seems that the application container is unresponsive, and you suspect that the ESP pod has been restarted as a result. Is there a way to confirm this? The error messages you've seen are simple 503, 504, or killed logs, and you're unsure which part of the process caused the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants