Error: Health watches not enabled. Please enable watches #176

corrtia · 2024-06-28T09:13:56Z

I ran a dcgm container using nvcr.io/nvidia/cloud-native/dcgm:3.3.6-1-ubuntu22.04.

docker run --gpus all    -p 5554:5555 nvcr.io/nvidia/cloud-native/dcgm:3.3.6-1-ubuntu22.04

I think I ran the following command in the container, and then the following error occurred：

dcgmi health --check -g 1
Error: Health watches not enabled. Please enable watches.

The gpu environment ：

nvidia-smi 
Fri Jun 28 09:14:18 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.4     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-PCIE-32GB           Off | 00000000:1A:00.0 Off |                    0 |
| N/A   32C    P0              23W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE-32GB           Off | 00000000:1E:00.0 Off |                    0 |
| N/A   32C    P0              24W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE-32GB           Off | 00000000:3D:00.0 Off |                    0 |
| N/A   32C    P0              24W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla V100-PCIE-32GB           Off | 00000000:42:00.0 Off |                    0 |
| N/A   32C    P0              24W / 250W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: Health watches not enabled. Please enable watches #176

Error: Health watches not enabled. Please enable watches #176

corrtia commented Jun 28, 2024 •

edited

Loading

Error: Health watches not enabled. Please enable watches #176

Error: Health watches not enabled. Please enable watches #176

Comments

corrtia commented Jun 28, 2024 • edited Loading

corrtia commented Jun 28, 2024 •

edited

Loading