-
Notifications
You must be signed in to change notification settings - Fork 189
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Required prerequisites
- I have read the documentation https://nvitop.readthedocs.io.
- I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
- I have tried the latest version of nvitop in a new isolated virtual environment.
What version of nvitop are you using?
1.3.2
Operating system and version
Windows Server 2022 Datacenter
NVIDIA driver version
516.01
NVIDIA-SMI
Wed Oct 23 15:33:05 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 516.01 Driver Version: 516.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A5000 WDDM | 00000000:02:00.0 Off | Off |
| 30% 38C P8 17W / 230W | 464MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A5000 WDDM | 00000000:21:00.0 Off | Off |
| 30% 34C P8 6W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX A5000 WDDM | 00000000:49:00.0 Off | Off |
| 30% 30C P8 5W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA RTX A5000 WDDM | 00000000:4A:00.0 Off | Off |
| 30% 33C P8 4W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 9436 C+G ...lPanel\SystemSettings.exe N/A |
| 0 N/A N/A 12172 C+G ...5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 15244 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 17912 C+G ...2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 18180 C+G ...y\ShellExperienceHost.exe N/A |
+-----------------------------------------------------------------------------+
Python environment
python -m pip freeze
(base) C:\Users\Administrator>python -m pip freeze
absl-py==2.1.0
accelerate==0.24.1
aiofiles==23.2.0
aiohttp==3.8.5
aiosignal==1.3.1
altair==5.1.2
anaconda-anon-usage @ file:///C:/b/abs_95v3x0wy8p/croot/anaconda-anon-usage_1697038984188/work
anaconda-client==1.12.0
anaconda-cloud-auth @ file:///C:/b/abs_410afndtyf/croot/anaconda-cloud-auth_1697462767853/work
anaconda-navigator @ file:///C:/b/abs_cfvv8k_j21/croot/anaconda-navigator_1704813334508/work
anaconda-project @ file:///C:/ci_311/anaconda-project_1676458365912/work
annotated-types==0.6.0
ansicon==1.89.0
antlr4-python3-runtime==4.9.3
anyio==3.7.1
archspec @ file:///croot/archspec_1709217642129/work
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
async-timeout==4.0.3
attrs==24.2.0
Babel==2.14.0
backports.functools-lru-cache @ file:///tmp/build/80754af9/backports.functools_lru_cache_1618170165463/work
backports.tempfile @ file:///home/linux1/recipes/ci/backports.tempfile_1610991236607/work
backports.weakref==1.0.post1
beautifulsoup4 @ file:///C:/b/abs_0agyz1wsr4/croot/beautifulsoup4-split_1681493048687/work
bleach==6.1.0
blessed==1.20.0
blinker==1.7.0
boltons @ file:///C:/ci_311/boltons_1677729932371/work
Brotli @ file:///C:/ci_311/brotli-split_1676435766766/work
cachetools==5.3.2
certifi @ file:///C:/b/abs_1fw_exq1si/croot/certifi_1725551736618/work/certifi
cffi @ file:///C:/b/abs_924gv1kxzj/croot/cffi_1700254355075/work
chardet @ file:///C:/ci_311/chardet_1676436134885/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
click @ file:///C:/b/abs_f9ihnt72pu/croot/click_1698129847492/work
clip==0.2.0
clyent==1.2.1
colorama @ file:///C:/ci_311/colorama_1676422310965/work
coloredlogs==15.0.1
comm==0.2.1
conda @ file:///C:/b/abs_85jnuwc__u/croot/conda_1729193917673/work
conda-build @ file:///C:/b/abs_3ed9gavxgz/croot/conda-build_1708025907525/work
conda-content-trust @ file:///tmp/build/80754af9/conda-content-trust_1617045594566/work
conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1727775630457/work/src
conda-pack @ file:///tmp/build/80754af9/conda-pack_1611163042455/work
conda-package-handling @ file:///C:/b/abs_b9wp3lr1gn/croot/conda-package-handling_1691008700066/work
conda-repo-cli==1.0.75
conda-token @ file:///Users/paulyim/miniconda3/envs/c3i/conda-bld/conda-token_1662660369760/work
conda-verify @ file:///D:/bld/conda-verify_1667049856137/work
conda_index @ file:///croot/conda-index_1706633791028/work
conda_package_streaming @ file:///C:/b/abs_6c28n38aaj/croot/conda-package-streaming_1690988019210/work
contourpy==1.2.0
cpm-kernels==1.0.11
cryptography @ file:///C:/b/abs_531eqmhgsd/croot/cryptography_1707523768330/work
cycler==0.12.1
ddddocr==1.5.5
debugpy==1.8.1
decorator==5.1.1
defusedxml @ file:///tmp/build/80754af9/defusedxml_1615228127516/work
distro @ file:///C:/b/abs_a3uni_yez3/croot/distro_1701455052240/work
easydict==1.12
einops==0.7.0
executing==2.0.1
fastapi==0.104.1
fastjsonschema @ file:///C:/ci_311/python-fastjsonschema_1679500568724/work
ffmpy==0.3.1
filelock @ file:///C:/b/abs_f2gie28u58/croot/filelock_1700591233643/work
flatbuffers==24.3.25
fonttools==4.49.0
fqdn==1.5.1
frozendict @ file:///C:/b/abs_2alamqss6p/croot/frozendict_1713194885124/work
frozenlist==1.4.1
fsspec==2023.10.0
ftfy==6.1.3
future @ file:///C:/ci_311_rebuilds/future_1678998246262/work
gitdb==4.0.11
GitPython==3.1.40
gmpy2 @ file:///C:/ci_311/gmpy2_1677743390134/work
gpustat==1.1.1
gradio==3.39.0
gradio_client==0.7.0
grpcio==1.60.1
h11==0.14.0
httpcore==1.0.1
httpx==0.25.1
huggingface-hub==0.19.0
humanfriendly==10.0
idna @ file:///C:/ci_311/idna_1676424932545/work
importlib-metadata==6.8.0
ipykernel==6.29.2
ipython==8.21.0
ipywidgets==8.1.2
isoduration==20.11.0
jaraco.classes @ file:///tmp/build/80754af9/jaraco.classes_1620983179379/work
jedi==0.19.1
Jinja2 @ file:///C:/b/abs_f7x5a8op2h/croot/jinja2_1706733672594/work
jinxed==1.2.1
json5==0.9.14
jsonpatch @ file:///tmp/build/80754af9/jsonpatch_1615747632069/work
jsonpointer==2.1
jsonschema @ file:///C:/b/abs_d1c4sm8drk/croot/jsonschema_1699041668863/work
jsonschema-specifications @ file:///C:/b/abs_0brvm6vryw/croot/jsonschema-specifications_1699032417323/work
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.2
jupyter_client==8.6.0
jupyter_core @ file:///C:/b/abs_c769pbqg9b/croot/jupyter_core_1698937367513/work
jupyter_server==2.12.5
jupyter_server_terminals==0.5.2
jupyterlab==4.1.1
jupyterlab-language-pack-zh-CN==4.0.post3
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.3
jupyterlab_widgets==3.0.10
keyring @ file:///C:/b/abs_dbjc7g0dh2/croot/keyring_1678999228878/work
kiwisolver==1.4.5
latex2mathml==3.76.0
libarchive-c @ file:///tmp/build/80754af9/python-libarchive-c_1617780486945/work
libmambapy @ file:///C:/b/abs_2euls_1a38/croot/mamba-split_1704219444888/work/libmambapy
linkify-it-py==2.0.3
Markdown==3.7
markdown-it-py==2.2.0
MarkupSafe @ file:///C:/b/abs_ecfdqh67b_/croot/markupsafe_1704206030535/work
matplotlib==3.8.3
matplotlib-inline==0.1.6
mdit-py-plugins==0.3.3
mdtex2html==1.2.0
mdurl==0.1.2
menuinst @ file:///C:/b/abs_099kybla52/croot/menuinst_1706732987063/work
mistune==3.0.2
mkl-fft @ file:///C:/b/abs_19i1y8ykas/croot/mkl_fft_1695058226480/work
mkl-random @ file:///C:/b/abs_edwkj1_o69/croot/mkl_random_1695059866750/work
mkl-service==2.4.0
more-itertools @ file:///C:/b/abs_36p38zj5jx/croot/more-itertools_1700662194485/work
mpmath @ file:///C:/b/abs_7833jrbiox/croot/mpmath_1690848321154/work
multidict==6.1.0
navigator-updater @ file:///C:/b/abs_895otdwmo9/croot/navigator-updater_1695210220239/work
nbclient==0.9.0
nbconvert==7.16.1
nbformat @ file:///C:/b/abs_5a2nea1iu2/croot/nbformat_1694616866197/work
nest-asyncio==1.6.0
networkx @ file:///C:/b/abs_e6gi1go5op/croot/networkx_1690562046966/work
notebook==7.1.0
notebook_shim==0.2.4
numpy @ file:///C:/b/abs_16b2j7ad8n/croot/numpy_and_numpy_base_1704311752418/work/dist/numpy-1.26.3-cp311-cp311-win_amd64.whl#sha256=5f2c4b54fd5d52b9fb18e32607c79b03cf14665cecce8a5a10e2950559df4651
nvidia-ml-py==12.535.161
nvitop==1.3.2
omegaconf==2.3.0
onnxruntime==1.19.2
opencv-python-headless==4.10.0.84
orjson==3.9.10
outcome==1.3.0.post0
overrides==7.7.0
packaging @ file:///C:/b/abs_28t5mcoltc/croot/packaging_1693575224052/work
pandas==2.0.3
pandocfilters==1.5.1
parso==0.8.3
pathlib @ file:///Users/ktietz/demo/mc3/conda-bld/pathlib_1629713961906/work
pillow @ file:///C:/b/abs_e22m71t0cb/croot/pillow_1707233126420/work
pkce @ file:///C:/b/abs_d0z4444tb0/croot/pkce_1690384879799/work
pkginfo @ file:///C:/b/abs_d18srtr68x/croot/pkginfo_1679431192239/work
platformdirs @ file:///C:/b/abs_b6z_yqw_ii/croot/platformdirs_1692205479426/work
pluggy @ file:///C:/ci_311/pluggy_1676422178143/work
ply==3.11
prettytable==3.9.0
prometheus_client==0.20.0
prompt-toolkit==3.0.43
propcache==0.2.0
protobuf==4.25.0
psutil @ file:///C:/ci_311_rebuilds/psutil_1679005906571/work
pure-eval==0.2.2
pyarrow==12.0.1
pycosat @ file:///C:/b/abs_31zywn1be3/croot/pycosat_1696537126223/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pydantic @ file:///C:/b/abs_9byjrk31gl/croot/pydantic_1695798904828/work
pydantic_core==2.10.1
pydeck==0.8.1b0
pydub==0.25.1
Pygments==2.17.2
PyJWT @ file:///C:/ci_311/pyjwt_1676438890509/work
pyparsing==3.1.1
PyQt5==5.15.10
PyQt5-sip @ file:///C:/b/abs_c0pi2mimq3/croot/pyqt-split_1698769125270/work/pyqt_sip
pyreadline3==3.5.4
PySocks @ file:///C:/ci_311/pysocks_1676425991111/work
python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
python-dotenv @ file:///C:/ci_311/python-dotenv_1676455170580/work
python-json-logger==2.0.7
python-multipart==0.0.6
pytz @ file:///C:/b/abs_19q3ljkez4/croot/pytz_1695131651401/work
pywin32==305.1
pywin32-ctypes @ file:///C:/ci_311/pywin32-ctypes_1676427747089/work
pywinpty==2.0.12
PyYAML @ file:///C:/b/abs_782o3mbw7z/croot/pyyaml_1698096085010/work
pyzmq==25.1.2
qtconsole==5.5.1
QtPy @ file:///C:/b/abs_derqu__3p8/croot/qtpy_1700144907661/work
referencing @ file:///C:/b/abs_09f4hj6adf/croot/referencing_1699012097448/work
regex==2023.12.25
requests @ file:///C:/b/abs_474vaa3x9e/croot/requests_1707355619957/work
requests-mock==1.12.1
requests-toolbelt @ file:///C:/b/abs_2fsmts66wp/croot/requests-toolbelt_1690874051210/work
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.6.0
rpds-py @ file:///C:/b/abs_76j4g4la23/croot/rpds-py_1698947348047/work
ruamel-yaml-conda @ file:///C:/ci_311/ruamel_yaml_1676455799258/work
ruamel.yaml @ file:///C:/ci_311/ruamel.yaml_1676439214109/work
safetensors==0.3.3
selenium==4.25.0
semantic-version==2.10.0
semver @ file:///tmp/build/80754af9/semver_1603822362442/work
Send2Trash==1.8.2
sentencepiece==0.1.99
sip @ file:///C:/b/abs_edevan3fce/croot/sip_1698675983372/work
six @ file:///tmp/build/80754af9/six_1644875935023/work
smmap==5.0.1
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve @ file:///C:/b/abs_bbsvy9t4pl/croot/soupsieve_1696347611357/work
sse-starlette==1.6.5
stack-data==0.6.3
starlette==0.27.0
streamlit==1.28.1
sympy @ file:///C:/b/abs_82njkonm7f/croot/sympy_1701397685028/work
tenacity==8.2.3
tensorboard==2.16.2
tensorboard-data-server==0.7.2
termcolor==2.4.0
terminado==0.18.0
timm==0.9.12
tinycss2==1.2.1
tokenizers==0.13.3
toml==0.10.2
toolz==1.0.0
torch==2.2.0
torchaudio==2.2.0
torchvision==0.17.0
tornado @ file:///C:/b/abs_0cbrstidzg/croot/tornado_1696937003724/work
tqdm @ file:///C:/b/abs_f76j9hg7pv/croot/tqdm_1679561871187/work
traitlets @ file:///C:/ci_311/traitlets_1676423290727/work
transformers==4.30.2
trio==0.26.2
trio-websocket==0.11.1
truststore @ file:///C:/b/abs_55z7b3r045/croot/truststore_1695245455435/work
types-python-dateutil==2.8.19.20240106
typing_extensions==4.12.2
tzdata==2023.3
tzlocal==5.2
uc-micro-py==1.0.3
ujson @ file:///C:/ci_311/ujson_1676434714224/work
uri-template==1.3.0
urllib3 @ file:///C:/b/abs_4etpfrkumr/croot/urllib3_1707770616184/work
uvicorn==0.24.0.post1
validators==0.22.0
watchdog==3.0.0
wcwidth==0.2.13
webcolors==1.13
webdriver-manager==4.0.2
webencodings==0.5.1
websocket-client==1.8.0
websockets==11.0.3
Werkzeug==3.0.4
widgetsnbextension==4.0.10
win-inet-pton @ file:///C:/ci_311/win_inet_pton_1676425458225/work
windows-curses==2.3.3
wsproto==1.2.0
yarl==1.14.0
zipp @ file:///C:/b/abs_b0beoc27oa/croot/zipp_1704206963359/work
zstandard==0.19.0
Problem description
When monitoring GPU usage with nvitop on Windows Server systems, the system experiences unexpected shutdowns. This issue appears to be caused by compatibility conflicts between nvitop and Windows Server's hardware monitoring system.
日志名称: System
来源: Microsoft-Windows-Kernel-Power
日期: 2024/10/23 15:18:48
事件 ID: 41
任务类别: (63)
级别: 关键
关键字: (70368744177664),(2)
用户: SYSTEM
计算机: WIN-3I9RKHAQAH5
描述:
系统已在未先正常关机的情况下重新启动。如果系统停止响应、发生崩溃或意外断电,则可能会导致此错误。
事件 Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331c3b3a-2005-44c2-ac5e-77220c37d6b4}" />
<EventID>41</EventID>
<Version>8</Version>
<Level>1</Level>
<Task>63</Task>
<Opcode>0</Opcode>
<Keywords>0x8000400000000002</Keywords>
<TimeCreated SystemTime="2024-10-23T07:18:48.0040790Z" />
<EventRecordID>210129</EventRecordID>
<Correlation />
<Execution ProcessID="4" ThreadID="8" />
<Channel>System</Channel>
<Computer>WIN-3I9RKHAQAH5</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="BugcheckCode">80</Data>
<Data Name="BugcheckParameter1">0xffffdc8d2fdfa000</Data>
<Data Name="BugcheckParameter2">0x2</Data>
<Data Name="BugcheckParameter3">0xfffff8029b2505e6</Data>
<Data Name="BugcheckParameter4">0x0</Data>
<Data Name="SleepInProgress">0</Data>
<Data Name="PowerButtonTimestamp">0</Data>
<Data Name="BootAppStatus">0</Data>
<Data Name="Checkpoint">0</Data>
<Data Name="ConnectedStandbyInProgress">true</Data>
<Data Name="SystemSleepTransitionsToOn">0</Data>
<Data Name="CsEntryScenarioInstanceId">136</Data>
<Data Name="BugcheckInfoFromEFI">false</Data>
<Data Name="CheckpointStatus">0</Data>
<Data Name="CsEntryScenarioInstanceIdV2">136</Data>
<Data Name="LongPowerButtonPressDetected">false</Data>
</EventData>
</Event>
Steps to Reproduce
Deep learning training using GPU first
Then use Nvitop to view GPU usage
Unexpected system shutdown
Traceback
None
Logs
None
Expected behavior
None
Additional context
None
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working