Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError: No module named 'pytz' from examples/machine-learning/a3-megagpu-8g #3275

Open
yunqin9 opened this issue Nov 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@yunqin9
Copy link

yunqin9 commented Nov 18, 2024

Probably applies to other deployments as well.

Saw the following errors from audit log on the controller node:

DEFAULT 2024-11-18T03:27:18.309157954Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.308896+00:00 a3mega-controller load_bq.py[7537]: Traceback (most recent call last):
DEFAULT 2024-11-18T03:27:18.309162135Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.308970+00:00 a3mega-controller load_bq.py[7537]: File "/slurm/scripts/load_bq.py", line 28, in <module>
DEFAULT 2024-11-18T03:27:18.309162835Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.309002+00:00 a3mega-controller load_bq.py[7537]: from google.cloud import bigquery as bq
DEFAULT 2024-11-18T03:27:18.309163373Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.309024+00:00 a3mega-controller load_bq.py[7537]: File "/usr/local/lib/python3.11/dist-packages/google/cloud/bigquery/__init__.py", line 35, in <module>
DEFAULT 2024-11-18T03:27:18.309163987Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.309040+00:00 a3mega-controller load_bq.py[7537]: from google.cloud.bigquery.client import Client
DEFAULT 2024-11-18T03:27:18.309164421Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.309055+00:00 a3mega-controller load_bq.py[7537]: File "/usr/local/lib/python3.11/dist-packages/google/cloud/bigquery/client.py", line 58, in <module>
DEFAULT 2024-11-18T03:27:18.309333700Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.309071+00:00 a3mega-controller load_bq.py[7537]: from google.cloud.bigquery.dataset import Dataset
DEFAULT 2024-11-18T03:27:18.309336403Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.309088+00:00 a3mega-controller load_bq.py[7537]: File "/usr/local/lib/python3.11/dist-packages/google/cloud/bigquery/dataset.py", line 27, in <module>
DEFAULT 2024-11-18T03:27:18.309337066Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.309101+00:00 a3mega-controller load_bq.py[7537]: from google.cloud.bigquery.table import TableReference
DEFAULT 2024-11-18T03:27:18.309337538Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.309114+00:00 a3mega-controller load_bq.py[7537]: File "/usr/local/lib/python3.11/dist-packages/google/cloud/bigquery/table.py", line 24, in <module>
DEFAULT 2024-11-18T03:27:18.309337983Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.309127+00:00 a3mega-controller load_bq.py[7537]: import pytz
DEFAULT 2024-11-18T03:27:18.309338450Z [resource.labels.instanceId: 9035929174565967148] 2024-11-18T03:27:18.309138+00:00 a3mega-controller load_bq.py[7537]: ModuleNotFoundError: No module named 'pytz'

Apparently there was a missing python module pytz in the controller image.
However it wasn't super clear to me where this can be added. Perhaps in the image_build_script?

@yunqin9 yunqin9 added the bug Something isn't working label Nov 18, 2024
@harshthakkar01
Copy link
Contributor

https://github.com/GoogleCloudPlatform/cluster-toolkit/tree/main/examples/machine-learning/a3-megagpu-8g has 3 blueprints.

Which blueprint did you deploy and had this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants