No double rate 16bit support on RX 6700XT #279

PMunkes · 2022-07-01T10:34:54Z

Using the latest release of AMDVLK included in the 22.20 driver for Ubuntu 22.04 only gives single rate performance when testing with vkpeak. AMDGPU-PRO from the same package provides support for packed FP16 support, but not for int16. RADV recently had support for both packed fp16 and int16 merged. Merge Request
Are there plans to provide support for double rate 16 bit instructions in the Open Source driver?

AMDVLK:

$ ./vkpeak 0
device       = AMD Radeon RX 6700 XT

fp32-scalar  = 12728.32 GFLOPS
fp32-vec4    = 12413.83 GFLOPS

fp16-scalar  = 12912.08 GFLOPS
fp16-vec4    = 12810.49 GFLOPS

fp64-scalar  = 826.05 GFLOPS
fp64-vec4    = 824.24 GFLOPS

int32-scalar = 2251.93 GIOPS
int32-vec4   = 2607.93 GIOPS

int16-scalar = 12896.81 GIOPS
int16-vec4   = 12806.51 GIOPS

AMDGPU-Pro:

$ ./vkpeak 0
device       = AMD Radeon RX 6700 XT

fp32-scalar  = 12839.66 GFLOPS
fp32-vec4    = 12780.54 GFLOPS

fp16-scalar  = 12205.66 GFLOPS
fp16-vec4    = 21747.10 GFLOPS

fp64-scalar  = 828.87 GFLOPS
fp64-vec4    = 826.11 GFLOPS

int32-scalar = 2611.42 GIOPS
int32-vec4   = 2610.99 GIOPS

int16-scalar = 12142.63 GIOPS
int16-vec4   = 11723.00 GIOPS

RADV (git-f533dff 2022-07-01 jammy-oibaf-ppa):

$ ./vkpeak 0
device       = AMD Radeon RX 6700 XT (RADV NAVI22)

fp32-scalar  = 12746.13 GFLOPS
fp32-vec4    = 12810.01 GFLOPS

fp16-scalar  = 12953.09 GFLOPS
fp16-vec4    = 20505.55 GFLOPS

fp64-scalar  = 829.03 GFLOPS
fp64-vec4    = 826.07 GFLOPS

int32-scalar = 2244.89 GIOPS
int32-vec4   = 2610.19 GIOPS

int16-scalar = 12916.61 GIOPS
int16-vec4   = 20409.49 GIOPS

The text was updated successfully, but these errors were encountered:

PMunkes · 2022-07-01T10:46:42Z

The Windows Vulkan driver in Radeon Software 22.6.1 provides identical performance to AMDGPU-Pro:

vkpeak.exe 0
device       = AMD Radeon RX 6700 XT

fp32-scalar  = 13030.38 GFLOPS
fp32-vec4    = 12972.51 GFLOPS

fp16-scalar  = 12173.23 GFLOPS
fp16-vec4    = 21743.44 GFLOPS

fp64-scalar  = 820.97 GFLOPS
fp64-vec4    = 821.68 GFLOPS

int32-scalar = 2632.45 GIOPS
int32-vec4   = 2627.50 GIOPS

int16-scalar = 12169.22 GIOPS
int16-vec4   = 11854.09 GIOPS

Flakebi · 2022-07-08T07:37:06Z

llpc currently runs the scalarizer pass, which helps a lot in reducing register pressure and increasing occupancy but has the side-effect of preventing packed instructions. See this issue for more details and workarounds: GPUOpen-Drivers/llpc#1369

oscarbg mentioned this issue Jul 18, 2022

Code generation for packed math instructions GPUOpen-Drivers/llpc#1369

Closed

jinjianrong added the reproducing Reproducing the issue label May 27, 2024

jinjianrong added reproduced The issue is reproduced by CQE assigned The issue is assigned to engineer and removed reproducing Reproducing the issue labels Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No double rate 16bit support on RX 6700XT #279

No double rate 16bit support on RX 6700XT #279

PMunkes commented Jul 1, 2022 •

edited

Loading

PMunkes commented Jul 1, 2022

Flakebi commented Jul 8, 2022

No double rate 16bit support on RX 6700XT #279

No double rate 16bit support on RX 6700XT #279

Comments

PMunkes commented Jul 1, 2022 • edited Loading

PMunkes commented Jul 1, 2022

Flakebi commented Jul 8, 2022

PMunkes commented Jul 1, 2022 •

edited

Loading