-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
./run_pytorch_gpu_simple_test.sh fails after successful build (gfx1010) #98
Comments
Hi, thanks for testing. It seems that the application is actually working ok despite the messages about hip_fastbin.cpp. I should probably change the wording a little or put them in future to be printed only if some environment variable is set. One app you could try to test with your setup pretty easily is the whisper which can interpter words from music.
You should also be able to change the "small" model to something else. If you have some ideas for apps to test, I would like to get more feedback to |
oh well then i was worrying about nothing, but it's good to hear that it is actually working.
I tried stable diffusion with SD.Next using the env_rocm.sh script but it failed to generate an image throwing |
What does it show for you if you run the commands: $ source /opt/rocm_sdk_612/bin/env_rocm.sh Not sure whether @daniandtheweb has tested the stable diffusion with rocm. I have recently mostly run pytorch audio transformation tests and some image recognization test apps. I hope we could integrate some good stable diffusion app soon to build. |
output of which python: output of python --version:
Thanks i'll take a look. I don't suppose there is an easy way to change the python version? |
For me everything works fine, be careful with SD.Next's settings as some work quite badly on AMD hardware in general. My best advice for running it is leaving most diffusers stuff to stock and just enable medvram. I advise you to try ComfyUI, it has a higher learning curve than SD.Next but the settings are minimal and there's a much less chance of messing up something. |
I would do that, but since i have cleared the venv it doesn't even reinitialise when i start webui.sh:
I was thinking about trying ComfyUI as well but i haven't yet. I'll definitely look into it soon. Do you think it will work with python 3.9.19 by default or do i need to do something? |
We just updated our rock sdk builder code on yesterday to use python 3.11. But that would require now you to do new build :-( Unfortunately the python version update is so big thing that basically everything needs to be rebuild. If you can wait for one day, I could get couple of more good python fixes in. I can then guide you to update the source code and rebuild. |
SD.Next removed the support for Python 3.9 not much time ago, that's one of the reasons I started working on the Python update here. If you want to run it like that you'll have to modify SD.Next's launch file but it still may not work properly. You can use ComfyUI (it should work on Python 3.9) or just wait for @lamikr to push some new fixes and help you update. |
That's what i suspected :( did the 6.1.1 release have a newer python though? because i can't figure out what i did ( wrong) to make SD.Next start up before.
I'm not in any rush, just playing around trying to learn, so i have no problem with waiting. Thanks for your help! |
I am happy to report that ComfyUI worked for me as well, but since I'm not too familiar with it, I couldn't test a lot of features. At least the default settings worked and generated images successfully with an SD 1.5 model. |
Try to revert SD.Next to this commit: 0680a88 .
This should revert SD.Next right before the new Python check was implemented. |
All python fixes are now in place and to do a fresh build without downloading everything you should do these steps to get good build with python 3.11.
If you want to keep the old build just in case, you can rename the /opt/rocm_sdk_612 folder instead of deleting it. |
Btw, not sure whether this benchmark runs on rx 5600, but it would be interesting to now the results both with the https://github.com/lamikr/pytorch-gpu-benchmark After running the benchmark It will store files that needs to be copied. For example Eitch sends his results from 7900xtx couple of weeks ago in lamikr/pytorch-gpu-benchmark#1 |
I've tried running it some time ago on my 5700 XT and it didn't work (I can only guess it could be related to the non official support status of ROCm for the card and maybe some other fix is needed, it should be the same for the 5600). I'll try it again after the build I've just started completes. |
This is the error using pytorch-gpu-benchmark on 5700xt:
|
I have run the test with the python 3.9 version and it fails:
I will start building the new version and report what happens with SD.Next ( which didn't work with |
Thanks, let me know how it goes. I have used 5700 with opencl apps and sometimes also with the pytorch but I do not have always access to that gpu so your stack trace helped. It may take some days, but I will try to check at some point if I can get that compiler error fixed. gfx1010 should have v_add_f32... |
The new build failed at first in the 035_AMDMIGraphX phase:
I was able to continue the build after installing a newer version of pybind11-dev(2.11.1 as opposed to 2.9.1) from the ubuntu repo for mantic (23.10). Please let me know if i should do a rebuild from scratch since i changed the pybind11-dev version mid build ( in phase 035). As for the benchmark, unsurprisingly it didn't output anything different than before. ComfyUI now seems to have problems which it didn't have before with VRAM when doing the VAE Decode:
VAE Decode still works perfectly fine using the Finally SD.Next still shows:
but i am not sure if i am setting it up correctly.
which doesn't seem to use the build in /opt/rocm_sdk_612. As well as:
This second variant has some python package version mismatches:
also when loading a model (size 2034 MB) it runs out of VRAM:
|
Try doing this, open a new terminal window and go to the SD.Next folder:
After this try to load the program and see how it goes. PS: the sqalchemy issue gets solved just by manually installing sqalchemy. As for ComfyUI do the same, delete the venv and recreate it by scratch. I launch it with this command and it works if you're interested:
|
Recreating the venv from scratch worked, thanks! I tried and SD.Next seems to work with both fp32 and fp16. When i was trying SD.Next on Windows i was told my card would only support fp32 though. (probably a Windows/ZLUDA problem).
Once again recreating venv solved it.
It now works without any options for me, but I'll try your options and report if it does anything notably different. |
I'm glad everything works now. I use quad attention as it's the more memory efficient on AMD. The other settings should be the default ones but I use them just in case. |
- allows running the pytoch_gpu_benchmarks from https://github.com/lamikr/pytorch-gpu-benchmark with gfx101/amd rx 5000 series (tested on 5700 xt) fixes: #98 Signed-off-by: Mika Laitio <[email protected]>
@silicium42 @daniandtheweb I pushed updates to MIOpen to support the pytorch gpu benchmark on rx5700 xt at least, would you try to test it? It does not recuire a full rebuild, only the MIOpen needs to be builded again. So these steps should work:
(5600 could probably also work with HSA_OVERRIDE_GFX_VERSION="10.1.0" but I have not way to test it) Not sure whether 5600 and 5700 has actually enough memory to run all of the tests in pytorch_gpu_benchmark, so it may need to comment some of them away. |
@lamikr The test now starts fine, however there's a strange bug that creashes my entire desktop while running the benchmark so I'm unable to finish it. It's unrelated to the MIOpen changes as I've already found this bug randomly while using pythorch. Here's the systemd-coredump if it can help you. What happens is that the GPU gets stuck at 100% usage and stopping the process causes the crash. There's plenty of free vram when this happens so I don't think that's related. This only happens with Pytorch. |
I can start the test as well now, but it also crashes. I tried it on the desktop and in a tty and got a bit further than @daniandtheweb (at least i think so) getting to:
There were no graphical glitches, my screens just went black and restarted. I don't know where to find the coredump, so i can't send it right now. Let me know if i should send it.
My 5700 has 8GB VRAM, i don't know if that would be enough. |
I realized that I have CK_BUFFER_RESOURCE_3RD_DWORD wrong for rx5700/gfx1010. Can you try to change the following from src_projects/MIOpen/src/composable_kernel/composable_kernel/include/utility/config.hpp // TODO: gfx1010 check CK_BUFFER_RESOURCE_3RD_DWORD to // TODO: gfx1010 check CK_BUFFER_RESOURCE_3RD_DWORD And then rebuild the MIOpen and try to run the benchmark again. Similar type of fix needs to be done propably a couple of other apps also later. |
If you prefer to have the full log i can upload it on drive or something like that if it can help with the debug. Let me know what can help to debug better. |
Does "dmesg" show anything from the linux kernel? |
@lamikr I found this output which seems related to the crash:
@daniandtheweb Thanks for your hint! I captured the output, but the file is 4.4GB so here are the last 500 lines: |
I get exactly the same output. |
Another thing still to try fast would be to disable the buffer on data transfer by changing the So, now the ./src/composable_kernel/composable_kernel/include/utility/config.hpp would be between lines
I check other things, if I can find some other reason and fix why the naive_conv_fwd_nchw kernel crashes the linux kernel. It may be related to the size of the data/problem that is transfered to gpu. In your logs there were |
The benchmark still fails on the first squeezenet test after the change. |
One way to reduce the memory usage is to run the tests with smaller batch size. So you could try to reduce the batch size from default 12 to 4 for example in test.sh script by changing the launch command to following: python3 benchmark_models.py -b 4 -g $c&& &>/dev/null |
Fails even faster using a lower batch size. |
I will prepare later today one patch which will add more debug to kernel loading, run, etc. |
I am adding more debug/tracing tools to build. If you have change, can you test if you can build them? (I have only tested so far with fedora 40 and updated install_deps.sh propably misses still something)
After build, the nvtop app should show the memory consumption and gpu utilization on another terminal window while you run for example the pytorch-gpu-benchmark Then for collecring memory usage data with amd-smi, following should work: amd-smi metric -m -g 0 --csv -w 2 -i 1000 --file out.txt Librreoffice could then show the csv file. If results are saved instead to json, maybe the perfetto could visualize them also easily? https://cug.org/proceedings/cug2023_proceedings/includes/files/tut105s2-file1.pdf |
He're the output while running the test: |
I'll be able to keep test this GPU just for today as I'm leaving for a few weeks and I won't have access to this GPU until the end of August. |
Are you able to check with nvtop installed that how much memory it is showing that the rx 6700/6600 is using before the crash? I have now tested with 7700S which also has 8GB of memory that in the very end of test it run's out of memory. Btw Have fun if you are leaving for holiday. Let's keep in touch. I try to work with the vega patches at some point. |
rocRAND had fixed one upstream gitsubmodule bug that forced me to use earlier own repo for building it. git checkout master |
Just checked the out.txt you send, so if the crash happened in the end, then it was definetly not yet run out of memory. |
Sorry for not answering, I've totally disconnected for a while and lost track of the messages, thanks btw. @lamikr I've recently rerun the benchmark with a clean build and the crash still happens, however I also managed to reproduce a similar crash during an image generation using vulkan in stable-diffusion.cpp while trying to use as much vram as possible. I'll try to investigate a bit more on this as with the new GTT policy in the kernel the system should be able to use GTT as a backup memory for the GPU ( or at least that's what it does on my laptop), so I'm not entirely sure of why saturating the VRAM still causes the crash on my desktop. |
I saw similar crashes originally also on gfx1011 than you on gfx1010 and I have now put quite a lot of updates. Are you able to test with the latest version of rocm_sdk_612 and with the latest version of benchmark? It would also be very interesting to know if latest linux-6.12-rc5 kernel brings some improvements. My latest tests did not crash on gfx1011 but results were slower than what I saw on copy-pasted screenshot earlier from gfx1010. |
The benchmarks in pytorch_gpu_benchmarks still fail causing the same old crash when trying to quit from the stuck program. I actually keep having this same issue using stable-diffusion xl in ComfyUI. |
oh well, that was then false hope that this could now have been resolved. Thanks for testing. |
If you come up with some new idea to fix this behaviour I'll be happy to test it |
I ordered used gfx1010 but it's about one week delivery time until I get it. I hope that will help on solving this. When looking for the kernel log from @silicium42, it may even be the same doorbell problem that's discussed here. (Navi10 related issues discussed in the bottom with some kernel patch suggested) |
I have some strange news. I'm not sure why but the benchmark suddenly completes. I tried rebooting my system multiple times and running the benchmark with different programs open and it just doesn't hang anymore. I've still found the strange issue that causes the gpu usage to be stuck at 100% even in idle after the test (even if this rarely happens now) but it still completes. EDIT: Apparently reverting the kernel to a previous version doesn't recreate the issue so it doesn't seem related. EDIT 2: When trying to run the MEDIUM size benchmark the issue just came back. I guess I've only been lucky to be able to finish the MINIMAL one without issues. |
Thanks for the results, I will add them to pytorch benchmark. I have also been able to run the minimum and medium tests for float and half precisions couple of times but then it may sometime randomly crash, so I am investigating the problem now on kernel side. |
I may have now fix for the problem, at least I managed to get rid of from my gpu hangs on rx 5700 and I have been now able to run the whole pytorch benchmark without crashes. Unfortunately the fix requires building a kernel. Be warned that only I have tested this, so I can not quarentee that it does not cause any unknown problems for example with memory corruption. I will still keep looking on this one for trying to understand did I have somehow missed the root cause of the problem as I just prevent the gpu to remove and restore queues on pre-emption phase. It includes similar type of fix both for the gfx1103 and gfx101 and it is based on to kernel 6.12.1 and can be build with commands:
reboot |
I've built the kernel and installed it on my computer but running the FULL benchmark the issue still shows up. As you can see at some point during the benchmark the GPU usage still gets stuck at 100% with a power consumption of ~50W (that seems to be a constant whenever the issue happens). The kernel driver didn't crash for now like other times. |
I added some trace to kernel that should be visible with dmesg command. Code is in kernel branch There are also small changes to pytorch benchmarks. For me the default benchmark that is there passes But if I change the model list to full, all benchmarks for float and half precisons passes but when it executes the training with double precision for resnext101_64x4d, then that will fail. But I did not see the gpu hangs happening. How about your stable diffusion, seeing still hangs on there? |
I still haven't tried stable-diffusion as I still haven't found a reliable way to replicate the issue there. Running the updated benchmark using the medium list I get the gpu hand at densenet161. Here's the log from dmesg related to the admgpu driver: However the driver still hasn't crashed. |
I am using Ubuntu 22.04 with an AMD RX 5700 graphics card (gfx1010) with the driver being installed with amdgpu-install from the repo.radeon.com repository for version 6.1.3 (amdgpu-install --usecase=graphics).
In the
babs.sh -i
step i selected gfx1010 target and i used no HSA_OVERRIDE_GFX_VERSION. After a few tries and executingsudo apt install libstdc++-12-dev libgfortran-12-dev gfortran-12
the whole project compiled in about 16 hours (probably took so long due to 16 GB RAM). Thebabs.sh -b
command says it has been successful. and rocminfo outputs the following:but the pytorch example exits almost immediately:
The other examples mentioned in the README.md seem to work fine/ don't crash. i don't exactly know what output to expect though.
I have tried the releases/rocm_sdk_builder_611 and releases/rocm_sdk_builder_612 branches without any luck so far.
Unfortunately i have no idea if that might be caused by a driver problem or a configuration problem or something else.
The README.md states that RX 5700 has been tested but there is no mention of an modified build/install procedure or a specific branch to use. I would appreciate any information on what could be causing this (i think maybe aotriton, but i know very little about rocm)
The text was updated successfully, but these errors were encountered: