Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker crash on readyz call #478

Open
Roalkege opened this issue Jun 14, 2023 · 0 comments
Open

docker crash on readyz call #478

Roalkege opened this issue Jun 14, 2023 · 0 comments

Comments

@Roalkege
Copy link

Hello I want to try and test Yatai with bentoml.
Idk if this is bentoml related or Yatai...

I containerize my bentofile and want to test the api.
After launching the localhost:3000 everything works fine...
But after calling /readyz my docker desktop crashed.

I have also the same problem on my Yatai instance after deploying a service.

Yatai Log `[2023-06-14 13:40:01] [Pod] [governance-b5dcf944c-8rqdz] [Created] Created container main [2023-06-14 13:40:01] [Pod] [governance-b5dcf944c-8rqdz] [Started] Started container main [2023-06-14 13:40:06] [Pod] [governance-b5dcf944c-8rqdz] [Unhealthy] Liveness probe errored: rpc error: code = Unknown desc = container not running (b50e5f47871d15a73d1a10f593ffa07c42336a90aaf6406221c069d06a323250) [2023-06-14 13:40:06] [Pod] [governance-b5dcf944c-8rqdz] [Unhealthy] Readiness probe errored: rpc error: code = Unknown desc = container not running (b50e5f47871d15a73d1a10f593ffa07c42336a90aaf6406221c069d06a323250) [2023-06-14 13:40:17] [HorizontalPodAutoscaler] [governance] [FailedGetResourceMetric] failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API [2023-06-14 13:40:17] [HorizontalPodAutoscaler] [governance] [FailedComputeMetricsReplicas] invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API [2023-06-14 13:40:21] [Pod] [governance-runner-0-85465c6b86-h2r8t] [Pulled] Container image "127.0.0.1:5000/yatai-bentos:yatai.governance_classifier.hcgdqdakukp2yaav" already present on machine [2023-06-14 13:40:21] [Pod] [governance-runner-0-85465c6b86-h2r8t] [Created] Created container main [2023-06-14 13:40:21] [Pod] [governance-runner-0-85465c6b86-h2r8t] [Started] Started container main [2023-06-14 13:40:26] [Pod] [governance-runner-0-85465c6b86-h2r8t] [Unhealthy] Readiness probe failed: Get "http://10.244.0.37:3000/readyz": dial tcp 10.244.0.37:3000: connect: connection refused [2023-06-14 13:40:27] [Pod] [governance-runner-0-85465c6b86-bnddg] [Pulled] Container image "127.0.0.1:5000/yatai-bentos:yatai.governance_classifier.hcgdqdakukp2yaav" already present on machine [2023-06-14 13:40:27] [Pod] [governance-runner-0-85465c6b86-bnddg] [Created] Created container main [2023-06-14 13:40:27] [Pod] [governance-runner-0-85465c6b86-bnddg] [Started] Started container main [2023-06-14 13:42:21] [Pod] [governance-runner-0-85465c6b86-h2r8t] [BackOff] Back-off restarting failed container main in pod governance-runner-0-85465c6b86-h2r8t_yatai(9f6cb8f8-e592-4fbb-aea3-89e969bdfc72) [2023-06-14 13:42:31] [Pod] [governance-b5dcf944c-8rqdz] [BackOff] Back-off restarting failed container main in pod governance-b5dcf944c-8rqdz_yatai(89e948b9-effb-4f82-82e4-4bcd426a6b88) [2023-06-14 13:42:32] [HorizontalPodAutoscaler] [governance] [FailedGetResourceMetric] failed to get cpu utilization: did not receive metrics for any ready pods [2023-06-14 13:42:41] [Pod] [governance-runner-0-85465c6b86-bnddg] [BackOff] Back-off restarting failed container main in pod governance-runner-0-85465c6b86-bnddg_yatai(196d4f8f-88a4-4241-a19e-c6836289e3de) [2023-06-14 13:45:55] [BentoDeployment] [governance] [GetDeployment] Getting Deployment yatai/governance-runner-0`
Docker Log ```2023-06-14T11:53:19+0000 [ERROR] [api_server:10] Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__ return await self.app(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/message_logger.py", line 86, in __call__ raise exc from None File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/message_logger.py", line 82, in __call__ await self.app(scope, inner_receive, inner_send) File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__ raise exc File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__ await self.app(scope, receive, _send) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/server/http/traffic.py", line 26, in __call__ await self.app(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/server/http/instruments.py", line 176, in __call__ await self.app(scope, receive, wrapped_send) File "/usr/local/lib/python3.9/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 579, in __call__ await self.app(scope, otel_receive, otel_send) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/server/http/access.py", line 126, in __call__ await self.app(scope, receive, wrapped_send) File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/starlette/_exception_handler.py", line 57, in wrapped_app raise exc File "/usr/local/lib/python3.9/site-packages/starlette/_exception_handler.py", line 46, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 727, in __call__ await route.handle(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 285, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 74, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.9/site-packages/starlette/_exception_handler.py", line 57, in wrapped_app raise exc File "/usr/local/lib/python3.9/site-packages/starlette/_exception_handler.py", line 46, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 69, in app response = await func(request) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/server/http_app.py", line 286, in readyz runners_ready = all(await asyncio.gather(*runner_statuses)) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 156, in runner_handle_is_ready return await self._runner_handle.is_ready(timeout) File "/usr/local/lib/python3.9/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 304, in is_ready async with self._client.get( File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 1141, in __aenter__ self._resp = await self._coro File "/usr/local/lib/python3.9/site-packages/aiohttp/client.py", line 560, in _request await resp.start(conn) File "/usr/local/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 914, in start self._continue = None File "/usr/local/lib/python3.9/site-packages/aiohttp/helpers.py", line 721, in __exit__ raise asyncio.TimeoutError from None asyncio.exceptions.TimeoutError 2023-06-14T11:53:20+0000 [WARNING] [runner:governance:1] No training configuration found in save file, so the model was *not* compiled. Compile it manually.```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant