Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rocm execution provider for AMD GPUs #1110

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

csukuangfj
Copy link
Collaborator

Fixes #196

Usage

  1. When building sherpa-onnx, please pass
-DSHERPA_ONNX_ENABLE_ROCM=ON
-DBUILD_SHARED_LIBS=ON
  1. when running sherpa-onnx, please use
--provider=rocm

(Please make sure you have installed ROCm on your computer and you have a discrete AMD GPU)


@thewh1teagle

Could you help test it? I don't have an AMD GPU and cannot test it.

Note it supports only Linux x64 at present.

@thewh1teagle
Copy link
Contributor

thewh1teagle commented Jul 11, 2024

Could you help test it? I don't have an AMD GPU and cannot test it.

Note it supports only Linux x64 at present.

Sure

log
build git:(rocm) ./bin/offline-tts-c-api \
  --vits-model=./vits-ljs.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --sid=0 \
  --provider=rocm \
  --output-filename=./generated.wav \
  'liliana, the most beautiful and lovely assistant of our team!'


here
terminate called after throwing an instance of 'Ort::Exception'
  what():  /shared/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_rocm.so with error: libMIOpen.so.1: cannot open shared object file: No such file or directory

[1]    16225 IOT instruction (core dumped)  ./bin/offline-tts-c-api --vits-model=./vits-ljs.onnx   --sid=0 --provider=rocbuild git:(rocm) find / -name libMIOpen.so.1 2>/dev/null 

/opt/rocm-6.0.0/lib/libMIOpen.so.1build git:(rocm) export LD_LIBRARY_PATH="/opt/rocm-6"                                          build git:(rocm) export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/rocm-6.0.0/lib"                 build git:(rocm) ./bin/offline-tts-c-api \                                    
  --vits-model=./vits-ljs.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --sid=0 \
  --provider=rocm \
  --output-filename=./generated.wav \
  'liliana, the most beautiful and lovely assistant of our team!'


here
terminate called after throwing an instance of 'Ort::Exception'
  what():  /shared/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_rocm.so with error: libroctx64.so.4: cannot open shared object file: No such file or directory

[1]    16506 IOT instruction (core dumped)  ./bin/offline-tts-c-api --vits-model=./vits-ljs.onnx   --sid=0 --provider=rocbuild git:(rocm) find / -name libroctx64.so.4 2>/dev/null

Looks like roctracer is missing. although I installed rocm from amd website.

AUTOMATIC1111/stable-diffusion-webui#10435

I tested on Ubuntu 22.04.4 LTS with amd ryzen 5 4500U (TPU)
And installed rocm by following https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html

When trying to compile and install rocmtracer (because it's not included in my rocm installer)

log
~/roctracer/build ~/roctracer
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
CMake Error at CMakeLists.txt:54 (find_package):
  Could not find a package configuration file provided by "HIP" with any of
  the following names:

    HIPConfig.cmake
    hip-config.cmake

  Add the installation prefix of "HIP" to CMAKE_PREFIX_PATH or set "HIP_DIR"
  to a directory containing one of the above files.  If "HIP" provides a
  separate development package or SDK, be sure it has been installed.


-- Configuring incomplete, errors occurred!
See also "/home/user/roctracer/build/CMakeFiles/CMakeOutput.log".

Trying further

log
roctracer git:(amd-master) find /opt -name "hip-config.cmake"
/opt/rocm-6.0.0/lib/cmake/hip/hip-config.cmakeroctracer git:(amd-master) export CMAKE_PREFIX_PATH=/opt/rocm-6.0.0roctracer git:(amd-master) ./build.sh                              
~/roctracer/build ~/roctracer
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
CMake Error at /opt/rocm-6.0.0/lib/cmake/hsa-runtime64/hsa-runtime64Targets.cmake:80 (message):
  The imported target "hsa-runtime64::hsa-runtime64" references the file

     "/opt/rocm-6.0.0/lib/libhsa-runtime64.so.1.12.60000"

  but this file does not exist.  Possible reasons include:

  * The file was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and contained

     "/opt/rocm-6.0.0/lib/cmake/hsa-runtime64/hsa-runtime64Targets.cmake"

  but not all the files it references.

Call Stack (most recent call first):
  /opt/rocm-6.0.0/lib/cmake/hsa-runtime64/hsa-runtime64-config.cmake:82 (include)
  CMakeLists.txt:53 (find_package)


-- Configuring incomplete, errors occurred!

@csukuangfj
Copy link
Collaborator Author

By the way, the onnxruntime lib with rocm we are using is built using rocm 6.1

https://github.com/csukuangfj/onnxruntime-libs/actions/runs/9886930772/job/27307628445#step:11:339

-- The HIP compiler identification is Clang 17.0.0

***** ROCm version from /opt/rocm/.info/version ****

ROCM_VERSION_DEV: 6.1.0
ROCM_VERSION_DEV_MAJOR: 6
ROCM_VERSION_DEV_MINOR: 1
ROCM_VERSION_DEV_PATCH: 0
ROCM_VERSION_DEV_INT:   60100

***** HIP LANGUAGE CONFIG INFO ****

CMAKE_HIP_COMPILER:      /opt/rocm/llvm/bin/clang++
CMAKE_HIP_ARCHITECTURES: gfx906;gfx908;gfx90a;gfx1030;gfx1100;gfx1101

Could you try rocm 6.1 instead?

@thewh1teagle
Copy link
Contributor

Could you try rocm 6.1 instead?

I tried but seems like amdgpu-install is broken or something like that

~ amdgpu-install --rocmrelease=6.1.0
Hit:1 https://packages.microsoft.com/repos/code stable InRelease
Hit:3 https://brave-browser-apt-release.s3.brave.com stable InRelease          
Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease               
Hit:5 http://il.archive.ubuntu.com/ubuntu jammy InRelease                      
Hit:6 http://il.archive.ubuntu.com/ubuntu jammy-updates InRelease              
Hit:2 https://apt.llvm.org/jammy llvm-toolchain-jammy-18 InRelease             
Hit:7 https://repo.radeon.com/amdgpu/6.1.2/ubuntu jammy InRelease    
Hit:8 http://il.archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:9 https://repo.radeon.com/rocm/apt/6.1.2 jammy InRelease
Hit:10 https://repo.radeon.com/rocm/apt/6.1.1 jammy InRelease
Reading package lists... Done                    
N: Skipping acquire of configured file 'main/binary-i386/Packages' as repository 'https://brave-browser-apt-release.s3.brave.com stable InRelease' doesn't support architecture 'i386'
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Translations (main/i18n/Translation-en_IL) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Translations (main/i18n/Translation-en) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11 (main/dep11/Components-amd64.yml) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11 (main/dep11/Components-all.yml) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons-small (main/dep11/icons-48x48.tar) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons (main/dep11/icons-64x64.tar) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons-hidpi (main/dep11/[email protected]) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target CNF (main/cnf/Commands-amd64) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target CNF (main/cnf/Commands-all) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Translations (main/i18n/Translation-en_IL) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target Translations (main/i18n/Translation-en) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11 (main/dep11/Components-amd64.yml) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11 (main/dep11/Components-all.yml) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons-small (main/dep11/icons-48x48.tar) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons (main/dep11/icons-64x64.tar) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target DEP-11-icons-hidpi (main/dep11/[email protected]) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target CNF (main/cnf/Commands-amd64) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
W: Target CNF (main/cnf/Commands-all) is configured multiple times in /etc/apt/sources.list.d/rocm.list:1 and /etc/apt/sources.list.d/rocm.list:2
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package rocm-opencl-runtime6.1.0
E: Couldn't find any package by glob 'rocm-opencl-runtime6.1.0'
E: Couldn't find any package by regex 'rocm-opencl-runtime6.1.0'
E: Unable to locate package rocm-hip-runtime6.1.0
E: Couldn't find any package by glob 'rocm-hip-runtime6.1.0'
E: Couldn't find any package by regex 'rocm-hip-runtime6.1.0'

I can use 6.1.2

@csukuangfj
Copy link
Collaborator Author

Does rocm 6.1.2 work for you?

@thewh1teagle
Copy link
Contributor

thewh1teagle commented Jul 12, 2024

Does rocm 6.1.2 work for you?

I compiled it with 6.1.2 and run, here's the log:

sherpa-onnx git:(rocm) ./build/bin/offline-tts-c-api \
  --vits-model=./vits-ljs.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --sid=0 \
  --provider=rocm \
  --output-filename=./generated.wav \
  'liliana, the most beautiful and lovely assistant of our team!'


here
2024-07-12 15:09:47.142736969 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-07-12 15:09:47.142779433 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV liliana. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV the. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV most. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV beautiful. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV and. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV lovely. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV assistant. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV of. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV our. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV team. Ignore it!
2024-07-12 15:09:47.491127847 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/enc_p/emb/Gather' Status Message: HIP error hipErrorInvalidDeviceFunction:invalid device function
terminate called after throwing an instance of 'Ort::Exception'
  what():  Non-zero status code returned while running Gather node. Name:'/enc_p/emb/Gather' Status Message: HIP error hipErrorInvalidDeviceFunction:invalid device function
[1]    87518 IOT instruction (core dumped)  ./build/bin/offline-tts-c-api --vits-model=./vits-ljs.onnx   --sid=0   

Maybe I need to set the GPU arch like in ROCm/ROCm#2536 (comment) but I couldn't find the specific architecture info

@csukuangfj
Copy link
Collaborator Author

It seems to be working.

Please re-check your lexicon.txt, tokens.txt and the onnx model.

Make sure you don't mix them. That's, don't use lexicon.txt from model1 with tokens.txt from model2 and onnx form model3.

@thewh1teagle
Copy link
Contributor

t seems to be working.

Please re-check your lexicon.txt, tokens.txt and the onnx model.

Make sure you don't mix them. That's, don't use lexicon.txt from model1 with tokens.txt from model2 and onnx form model3.

The same command with cpu provider works

./build/bin/offline-tts-c-api \
  --vits-model=./vits-ljs.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --sid=0 \
  --provider=cpu \ 
  --output-filename=./generated.wav \
  'liliana, the most beautiful and lovely assistant of our team!'


here
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV liliana. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV the. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV most. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV beautiful. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV and. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV lovely. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV assistant. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV of. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV our. Ignore it!
/home/user/Documents/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToTokenIdsNotChinese:335 OOV team. Ignore it!
Input text is: liliana, the most beautiful and lovely assistant of our team!
Speaker ID is is: 0
Saved to: ./generated.wav

@csukuangfj
Copy link
Collaborator Author

Please don't ignore OOVs.

You can listen to the generated.wav

Please post your lexicon.txt and tokens.txt

@csukuangfj
Copy link
Collaborator Author

There must be something wrong with your lexicon.txt and vits-ljs.onnx.

@thewh1teagle
Copy link
Contributor

Please don't ignore OOVs.

You can listen to the generated.wav

Please post your lexicon.txt and tokens.txt

You right. something was wrong with tokens / lexicon. the generated wav was invalid.
I re-downloaded them.
And now regenerated on cpu:

./build/bin/offline-tts-c-api \
  --vits-model=./vits-ljs.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --sid=0 \
  --provider=cpu \ 
  --output-filename=./generated.wav \
  'liliana, the most beautiful and lovely assistant of our team!'


here
Input text is: liliana, the most beautiful and lovely assistant of our team!
Speaker ID is is: 0
Saved to: ./generated.wav

The generated file is valid and sounds good.

Now with rocm:

./build/bin/offline-tts-c-api \
  --vits-model=./vits-ljs.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --sid=0 \
  --provider=rocm \
  --output-filename=./generated.wav \
  'liliana, the most beautiful and lovely assistant of our team!'


here
2024-07-12 16:05:19.861435222 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-07-12 16:05:19.861482854 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-07-12 16:05:20.499969272 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/enc_p/emb/Gather' Status Message: HIP error hipErrorInvalidDeviceFunction:invalid device function
terminate called after throwing an instance of 'Ort::Exception'
  what():  Non-zero status code returned while running Gather node. Name:'/enc_p/emb/Gather' Status Message: HIP error hipErrorInvalidDeviceFunction:invalid device function
[1]    98649 IOT instruction (core dumped)  ./build/bin/offline-tts-c-api --vits-model=./vits-ljs.onnx   --sid=0   

@thewh1teagle
Copy link
Contributor

Please post your lexicon.txt and tokens.txt

New lexicon & tokens

tokens.txt
lexicon.txt

sha256sum vits-ljs.onnx                                    
5bbd273797a9ecf8d94bd6ec02ad16cb41cbb85f055ad98d528ced3e44c9b31a  vits-ljs.onnx

@csukuangfj
Copy link
Collaborator Author

Could you have a look
ROCm/ROCm#2536

If you can run on CPU and the generated wav listens normal then there is no need to send the lexicon.txt and tokens.txt. Thanks!

@thewh1teagle
Copy link
Contributor

thewh1teagle commented Jul 12, 2024

Could you have a look ROCm/ROCm#2536

If you can run on CPU and the generated wav listens normal then there is no need to send the lexicon.txt and tokens.txt. Thanks!

I think that it works and there's just another bug in rocm itself.
Can we try it on Windows?

export ROCM_PATH=/opt/rocm-6.1.2
export HIP_VISIBLE_DEVICES=0
export ROCM_ARCH="gfx902'"
export HSA_OVERRIDE_GFX_VERSION=11.0.0
./build/bin/offline-tts-c-api \
  --vits-model=./vits-ljs.onnx \
  --vits-lexicon=./lexicon.txt \
  --vits-tokens=./tokens.txt \
  --sid=0 \
  --provider=rocm \
  --output-filename=./generated.wav \
  'liliana, the most beautiful and lovely assistant of our team!' 2>&1 | tee gpu.txt

System hangs...

Screen flickering...

Display server down....

image

Log

cat gpu.txt             
here
:3:rocdevice.cpp            :468 : 0220036292 us: [pid:4692  tid:0x73c526d91b00] Initializing HSA stack.
:3:rocdevice.cpp            :528 : 0220041500 us: [pid:4692  tid:0x73c526d91b00] Enumerated GPU agents = 1
:3:rocdevice.cpp            :232 : 0220041581 us: [pid:4692  tid:0x73c526d91b00] Numa selects cpu agent[0]=0x5fb6a8aa3bd0(fine=0x5fb6a8acd530,coarse=0x5fb6a9a4ae40) for gpu agent=0x5fb6a9a500b0 CPU<->GPU XGMI=0
:3:comgrctx.cpp             :33  : 0220041588 us: [pid:4692  tid:0x73c526d91b00] Loading COMGR library.
:3:rocdevice.cpp            :1785: 0220041963 us: [pid:4692  tid:0x73c526d91b00] Gfx Major/Minor/Stepping: 11/0/0
:3:rocdevice.cpp            :1787: 0220041971 us: [pid:4692  tid:0x73c526d91b00] HMM support: 1, XNACK: 0, Direct host access: 0
:3:rocdevice.cpp            :1789: 0220041972 us: [pid:4692  tid:0x73c526d91b00] Max SDMA Read Mask: 0x1, Max SDMA Write Mask: 0x1
:3:hip_context.cpp          :49  : 0220043222 us: [pid:4692  tid:0x73c526d91b00] Direct Dispatch: 1
:3:hip_device.cpp           :471 : 0220058354 us: [pid:4692  tid:0x73c526d91b00]  hipGetDevicePropertiesR0600 ( 0x7ffff6e01f70, 0 ) 
:3:hip_device.cpp           :473 : 0220058379 us: [pid:4692  tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :653 : 0220398012 us: [pid:4692  tid:0x73c526d91b00]  hipSetDevice ( 0 ) 
:3:hip_device_runtime.cpp   :657 : 0220398034 us: [pid:4692  tid:0x73c526d91b00] hipSetDevice: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :608 : 0220398038 us: [pid:4692  tid:0x73c526d91b00]  hipDeviceSynchronize (  ) 
:3:hip_device_runtime.cpp   :611 : 0220398043 us: [pid:4692  tid:0x73c526d91b00] hipDeviceSynchronize: Returned hipSuccess : 
:3:hip_device.cpp           :471 : 0220398047 us: [pid:4692  tid:0x73c526d91b00]  hipGetDevicePropertiesR0600 ( 0x5fb6b19ffb98, 0 ) 
:3:hip_device.cpp           :473 : 0220398052 us: [pid:4692  tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess : 
:3:hip_memory.cpp           :777 : 0220398064 us: [pid:4692  tid:0x73c526d91b00]  hipMemGetInfo ( 0x7ffff6e03158, 0x7ffff6e03160 ) 
:3:hip_memory.cpp           :801 : 0220398074 us: [pid:4692  tid:0x73c526d91b00] hipMemGetInfo: Returned hipSuccess : 
2024-07-12 16:41:38.625224858 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-07-12 16:41:38.625262851 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
:3:hip_device_runtime.cpp   :653 : 0220967630 us: [pid:4692  tid:0x73c526d91b00]  hipSetDevice ( 0 ) 
:3:hip_device_runtime.cpp   :657 : 0220967647 us: [pid:4692  tid:0x73c526d91b00] hipSetDevice: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :623 : 0220967654 us: [pid:4692  tid:0x73c526d91b00]  hipGetDevice ( 0x7ffff6e01ed8 ) 
:3:hip_device_runtime.cpp   :631 : 0220967657 us: [pid:4692  tid:0x73c526d91b00] hipGetDevice: Returned hipSuccess : 
:3:hip_device.cpp           :471 : 0220967661 us: [pid:4692  tid:0x73c526d91b00]  hipGetDevicePropertiesR0600 ( 0x7ffff6e01ed8, 0 ) 
:3:hip_device.cpp           :473 : 0220967665 us: [pid:4692  tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess : 
:3:hip_memory.cpp           :599 : 0220967689 us: [pid:4692  tid:0x73c526d91b00]  hipMalloc ( 0x5fb6b193feb8, 33554432 ) 
:3:rocdevice.cpp            :2363: 0220967805 us: [pid:4692  tid:0x73c526d91b00] device=0x5fb6a9a64ea0, freeMem_ = 0x1e000000
:3:hip_memory.cpp           :601 : 0220967811 us: [pid:4692  tid:0x73c526d91b00] hipMalloc: Returned hipSuccess : 0x73c3b1800000: duration: 122 us
:3:hip_context.cpp          :137 : 0220967819 us: [pid:4692  tid:0x73c526d91b00]  hipInit ( 0 ) 
:3:hip_context.cpp          :143 : 0220967823 us: [pid:4692  tid:0x73c526d91b00] hipInit: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :623 : 0220967827 us: [pid:4692  tid:0x73c526d91b00]  hipGetDevice ( 0x7ffff6e022b8 ) 
:3:hip_device_runtime.cpp   :631 : 0220967829 us: [pid:4692  tid:0x73c526d91b00] hipGetDevice: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :623 : 0220967832 us: [pid:4692  tid:0x73c526d91b00]  hipGetDevice ( 0x7ffff6e01c98 ) 
:3:hip_device_runtime.cpp   :631 : 0220967834 us: [pid:4692  tid:0x73c526d91b00] hipGetDevice: Returned hipSuccess : 
:3:hip_device.cpp           :471 : 0220967837 us: [pid:4692  tid:0x73c526d91b00]  hipGetDevicePropertiesR0600 ( 0x7ffff6e01c98, 0 ) 
:3:hip_device.cpp           :473 : 0220967840 us: [pid:4692  tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess : 
:3:hip_memory.cpp           :599 : 0220967844 us: [pid:4692  tid:0x73c526d91b00]  hipMalloc ( 0x5fb6b19c6688, 33554432 ) 
:3:rocdevice.cpp            :2363: 0220971114 us: [pid:4692  tid:0x73c526d91b00] device=0x5fb6a9a64ea0, freeMem_ = 0x1c000000
:3:hip_memory.cpp           :601 : 0220971136 us: [pid:4692  tid:0x73c526d91b00] hipMalloc: Returned hipSuccess : 0x73c3a9e00000: duration: 3292 us
:3:hip_device.cpp           :471 : 0220971149 us: [pid:4692  tid:0x73c526d91b00]  hipGetDevicePropertiesR0600 ( 0x7ffff6e018e8, 0 ) 
:3:hip_device.cpp           :473 : 0220971155 us: [pid:4692  tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess : 
:3:hip_device.cpp           :471 : 0220971188 us: [pid:4692  tid:0x73c526d91b00]  hipGetDevicePropertiesR0600 ( 0x7ffff6e018e8, 0 ) 
:3:hip_device.cpp           :473 : 0220971191 us: [pid:4692  tid:0x73c526d91b00] hipGetDevicePropertiesR0600: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :623 : 0221013862 us: [pid:4692  tid:0x73c526d91b00]  hipGetDevice ( 0x7ffff6e01c00 ) 
:3:hip_device_runtime.cpp   :631 : 0221013878 us: [pid:4692  tid:0x73c526d91b00] hipGetDevice: Returned hipSuccess : 
:3:hip_memory.cpp           :599 : 0221013885 us: [pid:4692  tid:0x73c526d91b00]  hipMalloc ( 0x7ffff6e01c00, 1048576 ) 
:3:rocdevice.cpp            :2363: 0221013985 us: [pid:4692  tid:0x73c526d91b00] device=0x5fb6a9a64ea0, freeMem_ = 0x1bf00000
:3:hip_memory.cpp           :601 : 0221013993 us: [pid:4692  tid:0x73c526d91b00] hipMalloc: Returned hipSuccess : 0x73c3b8200000: duration: 108 us
:3:hip_memory.cpp           :674 : 0221014025 us: [pid:4692  tid:0x73c526d91b00]  hipMemcpy ( 0x73c3b8200000, 0x5fb6b1c64340, 768, hipMemcpyHostToDevice ) 
:3:rocdevice.cpp            :2935: 0221014031 us: [pid:4692  tid:0x73c526d91b00] number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4
:3:rocdevice.cpp            :3013: 0221016894 us: [pid:4692  tid:0x73c526d91b00] created hardware queue 0x73c526d7e000 with size 16384 with priority 1, cooperative: 0
:3:rocdevice.cpp            :3105: 0221016909 us: [pid:4692  tid:0x73c526d91b00] acquireQueue refCount: 0x73c526d7e000 (1)
:3:devprogram.cpp           :2679: 0221225178 us: [pid:4692  tid:0x73c526d91b00] Using Code Object V5.

@csukuangfj
Copy link
Collaborator Author

Can we try it on Windows?

I will try to build a version of onnxruntime with rocm support for Windows.
Please wait for a while. Will let you know as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[help wanted] Support AMD GPU
2 participants