Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robotoff is down #349

Closed
alexgarel opened this issue May 27, 2024 · 1 comment
Closed

Robotoff is down #349

alexgarel opened this issue May 27, 2024 · 1 comment

Comments

@alexgarel
Copy link
Member

Seen by Pierre this morning.

@alexgarel
Copy link
Member Author

I did restart docker compose but it did not solve the problem.

On the containers VM:

docker compose logs --tail 1000 -f api
api-1  | [2024-05-27 09:20:47 +0000] [1297] [INFO] Booting worker with pid: 1297
api-1  | [2024-05-27 09:20:54 +0000] [7] [CRITICAL] WORKER TIMEOUT (pid:1285)
api-1  | [2024-05-27 09:20:54 +0000] [1285] [ERROR] Error handling request /api/v1/questions/3258260045522?lang=fr&server_domain=api.openfoodfacts.org
api-1  | Traceback (most recent call last):
api-1  |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 135, in handle
...
api-1  |   File "/opt/robotoff/robotoff/products.py", line 531, in get_product
api-1  |     return self.collection.find_one({"_id": product_id.barcode}, projection)
api-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
pi-1  |     self._select_servers_loop(
api-1  |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/pymongo/topology.py", line 270, in _select_servers_loop
api-1  |     self._condition.wait(common.MIN_HEARTBEAT_INTERVAL)
api-1  |   File "/usr/local/lib/python3.11/threading.py", line 331, in wait
api-1  |     gotit = waiter.acquire(True, timeout)
api-1  |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api-1  |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/gunicorn/workers/base.py", line 203, in handle_abort
api-1  |     sys.exit(1)
...
api-1  | SystemExit: 1
api-1  | [2024-05-27 09:20:54 +0000] [1285] [INFO] Worker exiting (pid: 1285)
api-1  | Sentry is attempting to send 2 pending error messages
api-1  | Waiting up to 2 seconds
api-1  | Press Ctrl-C to quit
api-1  | [2024-05-27 09:20:55 +0000] [7] [ERROR] Worker (pid:1285) was sent SIGKILL! Perhaps out of memory?

I then looked at stunnel.

The client side on ovh1, on LXC 113: systemctl status [email protected] shows it ok but with failing connections.

The server side on the proxy on off1 is down indeed (it was oom-killed...).
a systemctl restart [email protected] did it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

2 participants