Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect live sgpr analysis #124

Open
kirill146 opened this issue Dec 6, 2024 · 0 comments
Open

Incorrect live sgpr analysis #124

kirill146 opened this issue Dec 6, 2024 · 0 comments

Comments

@kirill146
Copy link

RGA 2.11, win11
kernel.cl:

__kernel __attribute__((reqd_work_group_size(8, 8, 1))) void foo(__global uchar *buf) {
  uint a = get_global_id(0);
  uint b = get_global_id(1);
  buf[a] = a - b;
}

Compiled as
rga -s opencl -c gfx1030 --O3 --livereg-sgpr . kernel.cl
Analysis result:

 Line | Rn  |          | Instruction
--------------------------------------------------------------------------------
    1 |   3 |     :::  | s_clause 0x1
    2 |   5 | ^^^^vv:  | s_load_dwordx4 s[0:3], s[4:5], 0x30
    3 |   5 | : : xx:  | s_load_dwordx2 s[4:5], s[4:5], null
    4 |   5 | : : ::v  | v_lshl_or_b32 v0, s6, 3, v0
    5 |   5 | : : :: ^ | s_mulk_i32 s7, 0xf8
    6 |   5 | : : :: : | s_waitcnt lgkmcnt(0)
    7 |   5 | : v :: : | v_add_nc_u32 v1, s2, v1
    8 |   4 | v   :: : | v_add_nc_u32 v0, s0, v0
    9 |   3 |     :: v | v_sub_nc_u32 v1, s7, v1
   10 |   2 |     ::   | v_add_nc_u32 v2, v1, v0
   11 |   3 | ^   v:   | v_add_co_u32 v0, s0, s4, v0
   12 |   2 | v    v   | v_add_co_ci_u32 v1, null, s5, 0, s0
   13 |   0 |          | global_store_byte v[0:1], v2, off
   14 |   0 |          | s_endpgm

At the start of the kernel register s7 contains get_group_id(1) and in the 5th instruction it gets both read and written (s_mulk_i32's semantic is D = D * simm16), so the correct analysis should actually start with

 Line | Rn  |          | Instruction
--------------------------------------------------------------------------------
    1 |   3 |     :::: | s_clause 0x1
    2 |   5 | ^^^^vv:: | s_load_dwordx4 s[0:3], s[4:5], 0x30
    3 |   5 | : : xx:: | s_load_dwordx2 s[4:5], s[4:5], null
    4 |   5 | : : ::v: | v_lshl_or_b32 v0, s6, 3, v0
    5 |   5 | : : :: x | s_mulk_i32 s7, 0xf8
   ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant