Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
initial rocBLAS logic files for iGPUs
- add initial rocBLAS logic files for rembrandt (gfx1035), raphael (gfx1036) and phoenix (gfx1103) iGPUs. - when testing with the https://github.com/LeiWang1999/rocblas-benchmark by using the std::make_tuple(8192, 8192, 8192, false, false, enable_tune), the speedup was about 4-5x. - gfx1035 without logic files Device 0: AMD Radeon Graphics m,n,k,a_t,b_t,enable_tune,fp32 time (msec),fp16-f32 time (msec), f16-f16 time (msec), int8-int32 time (msec) 8192,8192,8192,n,n,0,912.287,814.502,854.257,865.103 - gfx1035 with logic files Device 0: AMD Radeon Graphics m,n,k,a_t,b_t,enable_tune,fp32 time (msec),fp16-f32 time (msec), f16-f16 time (msec), int8-int32 time (msec) 8192,8192,8192,n,n,0,652.499,834.796,237.42,189.945 - gfx1103 without logic files Device 0: AMD Radeon 780M m,n,k,a_t,b_t,enable_tune,fp32 time (msec),fp16-f32 time (msec), f16-f16 time (msec), int8-int32 time (msec) 8192,8192,8192,n,n,0,916.684,820.721,823.48,1018.46 - gfx1103 with logic files ROCR_VISIBLE_DEVICES="1" ./rocblas_benchmark Device 0: AMD Radeon 780M m,n,k,a_t,b_t,enable_tune,fp32 time (msec),fp16-f32 time (msec), f16-f16 time (msec), int8-int32 time (msec) 8192,8192,8192,n,n,0,1346.02,634.836,193.613,119.29 Signed-off-by: Mika Laitio <[email protected]>
- Loading branch information