Skip to content

PeixuanZuo/profile_ck_ort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Usage

ROCm version

ROCm 5.4.0 Use docker image rocm/pytorch:rocm5.4_ubuntu20.04_py3.7_pytorch_1.12.1

Device

MI250X

Test Composable Kernel

cd ck

Build Composable Kernel

We change the repeat time each instance in tuning to 100. (build_ck.sh line8)

./build_ck.sh

Test gemm and gemm_fast_gelu

Because of CK doesn't provide profiler of gemm_fast_gelu, use gemm_add_add_fast_gelu to profile.

./test_gemm.sh
./test_gemm_fast_gelu.sh

All results are in logs file under ./composable_kernel/build/bin/

Test Composable Kernel in ORT

cd ort

Build ORT

./build_ort.sh

Test gemm and gemm_fast_gelu

./test_gemm.sh
./test_gemm_fast_gelu.sh

The default initialilzation is demical initialization. Please modify related test file to change the initialization method. See more details on test_gemm.sh and test_gemm_fast_gelu.sh.

Test hipBLASLt

cd hipblaslt

Build ORT

./build_hipblaslt.sh

Test gemm and gemm_gelu

./test_gemm.sh
./test_gemm_gelu.sh

Result

float16 M=49152 N=3072 K=768 notrans notrans

Gemm

We record the performance of instance DeviceGemmXdl<256, 128, 128, 4, 8, 32, 32, 2, 2> NumPrefetch: 1, LoopScheduler: Interwave, PipelineVersion: v1, which is the best instance selected by CK.

init method ORT(ms) CK(ms) hipBLASLt
zero 1.582 1.5806
integer 1.651 1.6828 1.6595
demical 2.071 1.81792 1.7734

GemmFastGelu

We record the performance of instance DeviceGemmMultipleD_Xdl_CShuffle<256, 128, 128, 32, 8, 8, Default> LoopScheduler: Interwave, PipelineVersion: v1 , which is the best instance selected by CK.

init method ORT(ms) CK(ms) hipBLASLt
zero 1.885 1.871
integer 1.956 2.001 1.7829
demical 2.422 2.184 1.8188

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages