Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Health watches not enabled. Please enable watches #176

Open
corrtia opened this issue Jun 28, 2024 · 0 comments
Open

Error: Health watches not enabled. Please enable watches #176

corrtia opened this issue Jun 28, 2024 · 0 comments

Comments

@corrtia
Copy link

corrtia commented Jun 28, 2024

I ran a dcgm container using nvcr.io/nvidia/cloud-native/dcgm:3.3.6-1-ubuntu22.04.

docker run --gpus all    -p 5554:5555 nvcr.io/nvidia/cloud-native/dcgm:3.3.6-1-ubuntu22.04

I think I ran the following command in the container, and then the following error occurred:

dcgmi health --check -g 1
Error: Health watches not enabled. Please enable watches.

The gpu environment :

nvidia-smi 
Fri Jun 28 09:14:18 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-PCIE-32GB           Off | 00000000:1A:00.0 Off |                    0 |
| N/A   32C    P0              23W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE-32GB           Off | 00000000:1E:00.0 Off |                    0 |
| N/A   32C    P0              24W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE-32GB           Off | 00000000:3D:00.0 Off |                    0 |
| N/A   32C    P0              24W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla V100-PCIE-32GB           Off | 00000000:42:00.0 Off |                    0 |
| N/A   32C    P0              24W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant