i64 tensor matmul memory validation

**Describe the bug**

Hi, I ran into an issue with memory validation for unit-partitioned int matmul that triggers this error: `Unable to launch matmul because the config is invalid: "This algorithm needs 81920 shared memory bytes but hardware limit is 49152. "`.

**To Reproduce**

- add this test in e.g. the main burn crate:
```
#[test]
fn mre_int_matmul() {
    use burn_core::tensor::TensorData;
    use tensor::{Int, Tensor};

    type Backend = backend::Wgpu<f32, i64>;
    let device: <Backend as burn_core::prelude::Backend>::Device = Default::default();

    const SIZE: usize = 512;

    let mk_tensor = || {
        let data = [0_i64].repeat(SIZE * SIZE);
        let data = TensorData::new(data, [SIZE, SIZE]);
        Tensor::<Backend, 2, Int>::from_data(data, &device)
    };

    let m = mk_tensor();
    let n = mk_tensor();
    let _o = m.matmul(n);
}
```
- Run with e.g. `cargo test mre_int_matmul --no-default-features --features wgpu`


**Desktop (please complete the following information):**
 - OS: Linux x86_64 with nvidia GPU


**Additional context**

- I noticed that when `autotune` feat is enabled, this is performed via cmma which doesn't have this issue
- I also tried to disable this validation in `crates/cubecl-matmul/src/components/stage/matmul/unit_partitioned/config.rs` and then matmul worked just fine

Limits returned from wgpu:
```
Limits {
    max_texture_dimension_1d: 32768,
    max_texture_dimension_2d: 32768,
    max_texture_dimension_3d: 16384,
    max_texture_array_layers: 2048,
    max_bind_groups: 8,
    max_bindings_per_bind_group: 1000,
    max_dynamic_uniform_buffers_per_pipeline_layout: 15,
    max_dynamic_storage_buffers_per_pipeline_layout: 16,
    max_sampled_textures_per_shader_stage: 1048576,
    max_samplers_per_shader_stage: 1048576,
    max_storage_buffers_per_shader_stage: 1048576,
    max_storage_textures_per_shader_stage: 1048576,
    max_uniform_buffers_per_shader_stage: 1048576,
    max_binding_array_elements_per_shader_stage: 1048576,
    max_binding_array_sampler_elements_per_shader_stage: 1048576,
    max_uniform_buffer_binding_size: 65536,
    max_storage_buffer_binding_size: 2147483647,
    max_vertex_buffers: 16,
    max_buffer_size: 18446744073709551615,
    max_vertex_attributes: 32,
    max_vertex_buffer_array_stride: 2048,
    min_uniform_buffer_offset_alignment: 64,
    min_storage_buffer_offset_alignment: 32,
    max_inter_stage_shader_components: 128,
    max_color_attachments: 8,
    max_color_attachment_bytes_per_sample: 128,
    max_compute_workgroup_storage_size: 49152,
    max_compute_invocations_per_workgroup: 1024,
    max_compute_workgroup_size_x: 1024,
    max_compute_workgroup_size_y: 1024,
    max_compute_workgroup_size_z: 64,
    max_compute_workgroups_per_dimension: 65535,
    min_subgroup_size: 32,
    max_subgroup_size: 32,
    max_push_constant_size: 256,
    max_non_sampler_bindings: 4294967295,
}
```

Variables in scope of the  problematic matmul validation:
```
self.tiling_scheme.elements_in_stage_mk() = 4096
self.num_stages.lhs = 1
self.tiling_scheme.elements_in_stage_nk() = 2048
self.num_stages.rhs = 1
self.tiling_scheme.elements_in_tile_mn() = 16
num_units = 256
lhs_s_size = 8
lhs_smem_size = 4096
rhs_s_size = 8
rhs_smem_size = 2048
eo_size = 8
out_smem_size = 4096
smem_total_size = 81920
smem_limit = 49152
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

i64 tensor matmul memory validation #3660

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

i64 tensor matmul memory validation #3660

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions