Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using global constant arrays produces slow Metal code #4367

Open
jimblandy opened this issue Oct 14, 2021 · 9 comments
Open

Using global constant arrays produces slow Metal code #4367

jimblandy opened this issue Oct 14, 2021 · 9 comments
Assignees
Labels
area: naga back-end Outputs of naga shader conversion lang: Metal Metal Shading Language naga Shader Translator

Comments

@jimblandy
Copy link
Member

jimblandy commented Oct 14, 2021

We generate Metal Shading Language code that runs slowly on some Apple devices, for shaders that index a large global constant array. The generated code copies a global constant into a function-local array every time the function is called; if the constant is large, this is expensive.

For the following input:

var<private> constant_array: array<f32, 4> = array<f32, 4>(0.0, 0.707, 1.0, 0.707);

[[stage(fragment)]]
fn access_array([[location(0)]] i: i32) -> [[location(0)]] f32 {
   return constant_array[i];
}

We generate the following MSL:

...

struct type1 {
    float inner[4];
};
constant type1 const_type1_ = {0.0, 0.7070000171661377, 1.0, 0.7070000171661377};

...
fragment ... access_array(
  ...
) {
    type1 constant_array = const_type1_;
    const auto i = ...;
    float _e3 = constant_array.inner[i];
    ...
}

This declares a global constant const_type1_, which access_array copies into the local variable constant_array each time it's called. For this example it's probably not a big deal, but people use big data tables, so the copy can be expensive.

Naga introduces this copy because MSL doesn't support mutable global arrays. Metal Shading Language Specification 2.4 §4, "Address Spaces", says, "The address space for a variable at program scope must be constant." Further, §4.2 says, "Variables in program scope must be declared in the constant address space and initialized during the declaration statement."

WGSL suggests that people use let for global constants, but the author can't use let here, since WGSL can't index arrays unless they're in memory.

What Naga should do when generating Metal (and what other toolchains seem to do now) is recognize that constant_array is not actually ever assigned to, generate a global variable in the Metal constant address space, and refer to that directly from function bodies.

@jimblandy jimblandy self-assigned this Oct 14, 2021
@jimblandy jimblandy added area: naga back-end Outputs of naga shader conversion lang: Metal Metal Shading Language labels Oct 14, 2021
@jimblandy
Copy link
Member Author

Here's what Google's Tint does with the example above:

struct tint_array_wrapper {
  float arr[4];
};
...
float access_array_inner(int i, thread tint_array_wrapper* const tint_symbol_3) {
  return (*(tint_symbol_3)).arr[i];
}

fragment tint_symbol_2 access_array(tint_symbol_1 tint_symbol [[stage_in]]) {
  thread tint_array_wrapper tint_symbol_4 = {.arr={0.0f, 0.707000017f, 1.0f, 0.707000017f}};
  float const inner_result = access_array_inner(tint_symbol.i, &(tint_symbol_4));
  tint_symbol_2 wrapper_result = {};
  wrapper_result.value = inner_result;
  return wrapper_result;
}

The semantics expressed are certainly run-time initialization, like Naga. But of course, since the issue here is inconsistent optimization (in some cases, Naga's output was fine), I don't think we can really say whether this is better or worse.

It seems safe to assume, though, that if we generate a global constant initialized at compile time, it's unlikely that any code generator along the path to the GPU would introduce run-time copies.

@kvark
Copy link
Member

kvark commented Oct 15, 2021

Thank you for writing this down!

type1 constant_array = const_type1_;

I'm not actually sure if that's a problem. I thought the problem was with code we inject to copy an array element-by-element, but this here is a bulk copy. Obviously can be improved, but I'd like us to not lose sight of what the actual issue is. A test case, maybe within https://github.com/kvark/wgpu-bench, would help.

@jimblandy
Copy link
Member Author

Yes, a test case would be good. I was trying to capture what was described as the problem in matrix chat.

@benjamin-kramer
Copy link

benjamin-kramer commented Aug 7, 2023

On a shader I'm using, this issue causes a 3x slowdown (from ~700us to 2ms).
The constant array I'm using is 64-floats-long.
I'm not sure exactly what causes the slowdown, but if one can believe Xcode's GPU trace, this is both the definition in the fragment shader (30%), and the actual access to this array (6%).

@teoxoy
Copy link
Member

teoxoy commented Aug 7, 2023

Are you on the latest version (0.13)? We got rid of the loop initialization in the latest release (in gfx-rs/naga#2331) which was thought to be the cause of the slowdown.

@benjamin-kramer
Copy link

benjamin-kramer commented Aug 7, 2023

Yes. The relevant generated code is:

struct type_6 {
    float inner[64];
};
float foo(
    ...
    thread type_6& kNormalRandomNumbers
) {
    ...
    float _e9 = kNormalRandomNumbers.inner[__some_dynamic_index__];
    ...
}
fragment mainFragmentOutput mainFragment(
    ...
) {
    ...
    type_6 kNormalRandomNumbers = type_6 {0.18074971, ..., 0.64232767};
    ...
    ... = foo(..., kNormalRandomNumbers);
    ...
}

If instead the array is defined as:

static constant float kNormalRandomNumbers[] = {
  0.18074972, ...,  0.64232764
};

And is accessed directly by foo (instead of receiving the array as a parameter), the running time of the shader function is improved from 2ms to 700us.

@teoxoy
Copy link
Member

teoxoy commented Aug 7, 2023

Would defining the array as a global const in your wgsl source resolve the issue?

A var<private> (assuming that's what you are using in your source wgsl for the array) is mutable memory and I'd imagine (if the metal compiler is not doing any other optimizations) that the array will keep being copied for each fragment.

@benjamin-kramer
Copy link

It would, if I wouldn't need to access it with a dynamic index.
As it is, it doesn't seem possible to use a const array here.

@teoxoy
Copy link
Member

teoxoy commented Aug 8, 2023

Ah, forgot we don't support that yet. See #4337.

@cwfitzgerald cwfitzgerald added the naga Shader Translator label Oct 25, 2023
@cwfitzgerald cwfitzgerald transferred this issue from gfx-rs/naga Oct 25, 2023
@teoxoy teoxoy added this to the WebGPU Specification V1 milestone Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: naga back-end Outputs of naga shader conversion lang: Metal Metal Shading Language naga Shader Translator
Projects
Status: No status
Development

No branches or pull requests

5 participants