-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build failure unable to find library -lhsakmt #181
Comments
Have you been earlier able to build the "016_03_llvm_project_openmp" project.
I have
All libhsak* versions in lib-directory are symlinks to lib64.
|
And let's check that all ldd dependencies are found. What does this show:
|
After the recent posts in the other ticket its building and appears to be progressing further... I will update here with the results once it completes or not. |
Thanks for letting know, it would be nice to know what caused that break. So you have Vega VII to test the gfx906? |
I have 2x MI60 (or 32GB MI50 whichever it really is) The build stopped awhile ago, and I reran ./babs.sh -b and it failed here adding 'torchvision-0.20.0a0+324eea9.dist-info/LICENSE' |
Hmm... Not really sure what is going on. In theory the benchmark should now be able to run some pytorch tests as it has now passed that and is trying now to build pytorch vision. So are you able to test with
If you are in master branch, can you do one more time these commands to verify everything is up to date and then restart pytorch vision build from clean.
I have started my self clean build on fedora 40 with gfx906 as an only target. But I need to wait until morning to see the results. |
[cb88@M31-AR0 ~]$ cat /opt/rocm_sdk_612/benchmarks/bench.txt Saving to file: 20241218_133446_pytorch_dot_products.txt
Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends Pytorch version: 2.4.1 Device: AMD EPYC 7352 24-Core Processor / cpu |
[cb88@M31-AR0 opt]$ find -name libhsakmt.so /opt/rocm is binary install from Arch. [cb88@M31-AR0 opt]$ ls -la /opt/rocm_sdk_612/lib/libhsakmt.* |
So it seems that the original problem with the missing symbol in rocBLAS is solved also for you now and pytorch is able to use the rocBLAS when using MATH backend. LLama.cpp that was also earlier failing for Said-akbar should probably also now work ok if you try to build it with
and then run
There is still this second problem with the flash-attention that needs to be solved. And at the moment I do not have any idea why the pytorch vision build fails for you. |
./babs.sh -b binfo/extra/ai_tools.blist ran for a bit then... -- Found Python: /opt/rocm_sdk_612/bin/python (found version "3.11.9") found components: Interpreter Development.Module Development.SABIModule Call Stack (most recent call first): -- Configuring incomplete, errors occurred! |
Hmm, vllm build that is before the llama.cpp build seems to fail for similar type error "corrupted size vs. prev_size in fastbins" than pytorch vision. How about if you just build the llama.cpp
|
Lllama builds and runs sucessfully |
Something is still not right with it though... llama_kv_cache_init: ROCm0 KV buffer size = 4000.00 MiB |
ld.lld: error: unable to find library -lhsakmt
make[2]: Leaving directory '/home/user/rocm_sdk_builder/builddir/016_03_llvm_project_openmp'
make[2]: Leaving directory '/home/user/rocm_sdk_builder/builddir/016_03_llvm_project_openmp'
[ 51%] Built target Utils.cpp-gfx906.bc
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
This is after ./babs.sh -up ./babs.sh --clean ./babs.sh -b
git rev 84faa05 I was attempting to test on my MI60 but haven't been able to get a clean build on ArchLinux.
The text was updated successfully, but these errors were encountered: