Skip to content

i64 tensor matmul memory validation #3660

@tzemanovic

Description

@tzemanovic

Describe the bug

Hi, I ran into an issue with memory validation for unit-partitioned int matmul that triggers this error: Unable to launch matmul because the config is invalid: "This algorithm needs 81920 shared memory bytes but hardware limit is 49152. ".

To Reproduce

  • add this test in e.g. the main burn crate:
#[test]
fn mre_int_matmul() {
    use burn_core::tensor::TensorData;
    use tensor::{Int, Tensor};

    type Backend = backend::Wgpu<f32, i64>;
    let device: <Backend as burn_core::prelude::Backend>::Device = Default::default();

    const SIZE: usize = 512;

    let mk_tensor = || {
        let data = [0_i64].repeat(SIZE * SIZE);
        let data = TensorData::new(data, [SIZE, SIZE]);
        Tensor::<Backend, 2, Int>::from_data(data, &device)
    };

    let m = mk_tensor();
    let n = mk_tensor();
    let _o = m.matmul(n);
}
  • Run with e.g. cargo test mre_int_matmul --no-default-features --features wgpu

Desktop (please complete the following information):

  • OS: Linux x86_64 with nvidia GPU

Additional context

  • I noticed that when autotune feat is enabled, this is performed via cmma which doesn't have this issue
  • I also tried to disable this validation in crates/cubecl-matmul/src/components/stage/matmul/unit_partitioned/config.rs and then matmul worked just fine

Limits returned from wgpu:

Limits {
    max_texture_dimension_1d: 32768,
    max_texture_dimension_2d: 32768,
    max_texture_dimension_3d: 16384,
    max_texture_array_layers: 2048,
    max_bind_groups: 8,
    max_bindings_per_bind_group: 1000,
    max_dynamic_uniform_buffers_per_pipeline_layout: 15,
    max_dynamic_storage_buffers_per_pipeline_layout: 16,
    max_sampled_textures_per_shader_stage: 1048576,
    max_samplers_per_shader_stage: 1048576,
    max_storage_buffers_per_shader_stage: 1048576,
    max_storage_textures_per_shader_stage: 1048576,
    max_uniform_buffers_per_shader_stage: 1048576,
    max_binding_array_elements_per_shader_stage: 1048576,
    max_binding_array_sampler_elements_per_shader_stage: 1048576,
    max_uniform_buffer_binding_size: 65536,
    max_storage_buffer_binding_size: 2147483647,
    max_vertex_buffers: 16,
    max_buffer_size: 18446744073709551615,
    max_vertex_attributes: 32,
    max_vertex_buffer_array_stride: 2048,
    min_uniform_buffer_offset_alignment: 64,
    min_storage_buffer_offset_alignment: 32,
    max_inter_stage_shader_components: 128,
    max_color_attachments: 8,
    max_color_attachment_bytes_per_sample: 128,
    max_compute_workgroup_storage_size: 49152,
    max_compute_invocations_per_workgroup: 1024,
    max_compute_workgroup_size_x: 1024,
    max_compute_workgroup_size_y: 1024,
    max_compute_workgroup_size_z: 64,
    max_compute_workgroups_per_dimension: 65535,
    min_subgroup_size: 32,
    max_subgroup_size: 32,
    max_push_constant_size: 256,
    max_non_sampler_bindings: 4294967295,
}

Variables in scope of the problematic matmul validation:

self.tiling_scheme.elements_in_stage_mk() = 4096
self.num_stages.lhs = 1
self.tiling_scheme.elements_in_stage_nk() = 2048
self.num_stages.rhs = 1
self.tiling_scheme.elements_in_tile_mn() = 16
num_units = 256
lhs_s_size = 8
lhs_smem_size = 4096
rhs_s_size = 8
rhs_smem_size = 2048
eo_size = 8
out_smem_size = 4096
smem_total_size = 81920
smem_limit = 49152

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions