killing envoy process #859

ben-swit · 2023-10-12T05:49:38Z

hi we using esp v2 in gke
and image like

Image:         gcr.io/endpoints-release/endpoints-runtime:2
Image ID:      gcr.io/endpoints-release/endpoints- 
runtime@sha256:1ffd99d09d89722cfbc9157eeec171a5566009c9fa36f081bbe8a462e9457225

sometime i saw health check fail like

  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  10m (x17 over 16d)    kubelet  Readiness probe failed: Get "http://10.32.11.246:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  10m (x16 over 7d20h)  kubelet  Liveness probe failed: Get "http://10.32.11.246:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    10m                   kubelet  Container esp failed liveness probe, will be restarted
  Normal   Started    10m (x2 over 16d)     kubelet  Started container esp
  Normal   Pulled     10m (x2 over 16d)     kubelet  Container image "gcr.io/endpoints-release/endpoints-runtime:2" already present on machine
  Normal   Created    10m (x2 over 16d)     kubelet  Created container esp
  Warning  Unhealthy  10m (x2 over 16d)     kubelet  Liveness probe failed: Get "http://10.32.11.246:8080/healthz": dial tcp 10.32.11.246:8080: connect: connection refused
  Warning  Unhealthy  10m (x3 over 16d)     kubelet  Readiness probe failed: Get "http://10.32.11.246:8080/healthz": dial tcp 10.32.11.246:8080: connect: connection refused

then pod killed so i check esp log like

 API failed with: 503 and body: upstream connect error or disconnect/reset before headers. reset reason: connection failure
2023-10-12 05:34:44,076: WARNING: got signal: SIGTERM
2023-10-12 05:34:44,673: INFO: sending TERM to PID=8
2023-10-12 05:34:44,674: INFO: sending TERM to PID=53
W1012 05:34:44.674 53 external/envoy/source/server/server.cc:854] [53][main]caught ENVOY_SIGTERM
I1012 05:34:44.674 53 external/envoy/source/server/server.cc:985] [53][main]shutting down server instance
I1012 05:34:44.674 53 external/envoy/source/server/server.cc:920] [53][main]main dispatch loop exited
W1012 05:34:44.674965       8 server.go:74] Server got signal terminated, stopping
E1012 05:34:45.174 70 src/envoy/http/service_control/client_cache.cc:161] [70][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.275 73 src/envoy/http/service_control/client_cache.cc:161] [73][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.377 81 src/envoy/http/service_control/client_cache.cc:161] [81][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.473 85 src/envoy/http/service_control/client_cache.cc:161] [85][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.478 86 src/envoy/http/service_control/client_cache.cc:161] [86][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.673 92 src/envoy/http/service_control/client_cache.cc:161] [92][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.776 101 src/envoy/http/service_control/client_cache.cc:161] [101][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:45.778 102 src/envoy/http/service_control/client_cache.cc:161] [102][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:46.181 121 src/envoy/http/service_control/client_cache.cc:161] [121][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
E1012 05:34:46.276 128 src/envoy/http/service_control/client_cache.cc:161] [128][filter]Failed to call report, error: CANCELLED:Request cancelled, str body:
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:183] Failed in Report call: Request cancelled
W1012 05:34:46.280 53 external/envoy/source/common/config/grpc_stream.h:201] [53][config]StreamAggregatedResources gRPC config stream to @espv2-ads-cluster closed since 1455705s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
I1012 05:34:46.280 53 external/envoy/source/server/server.cc:972] [53][main]exiting
2023-10-12 05:34:50,681: INFO: ===waitpid: pid=8: doesn't exit
2023-10-12 05:34:50,681: CRITICAL: Config Manager is down, killing envoy process.
2023-10-12 05:34:50,681: INFO: Killing process: pid=53
2023-10-12 05:34:50,681: ERROR: The child process: pid=53 may not exist.

so i want to know reason of killed

how to check is it related esp??

The text was updated successfully, but these errors were encountered:

ben-swit · 2023-10-12T05:52:38Z

we find same log #849

TAOXUY · 2023-10-12T15:47:43Z

I saw the servicecontrol service is 503/504, service side problem
the ESPv2 container was killed by health check failure timeout

My guess since 1), ESPv2/Envoy is busy with retrying report calls, causing 2).

Which region your deployment is? We can help check with the servicecontrol service.

ben-swit · 2023-10-15T06:24:11Z

@TAOXUY we use kubernetes that pod on us-west1-a,b,c
You mentioned that there is an application deployed in a sidecar format with ESP. From your assumption, it seems that the application container is unresponsive, and you suspect that the ESP pod has been restarted as a result. Is there a way to confirm this? The error messages you've seen are simple 503, 504, or killed logs, and you're unsure which part of the process caused the error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

killing envoy process #859

killing envoy process #859

ben-swit commented Oct 12, 2023

ben-swit commented Oct 12, 2023

TAOXUY commented Oct 12, 2023

ben-swit commented Oct 15, 2023 •

edited

Loading

killing envoy process #859

killing envoy process #859

Comments

ben-swit commented Oct 12, 2023

ben-swit commented Oct 12, 2023

TAOXUY commented Oct 12, 2023

ben-swit commented Oct 15, 2023 • edited Loading

ben-swit commented Oct 15, 2023 •

edited

Loading