-
Notifications
You must be signed in to change notification settings - Fork 695
Open
Description
Describe the bug
Hi, I ran into an issue with memory validation for unit-partitioned int matmul that triggers this error: Unable to launch matmul because the config is invalid: "This algorithm needs 81920 shared memory bytes but hardware limit is 49152. "
.
To Reproduce
- add this test in e.g. the main burn crate:
#[test]
fn mre_int_matmul() {
use burn_core::tensor::TensorData;
use tensor::{Int, Tensor};
type Backend = backend::Wgpu<f32, i64>;
let device: <Backend as burn_core::prelude::Backend>::Device = Default::default();
const SIZE: usize = 512;
let mk_tensor = || {
let data = [0_i64].repeat(SIZE * SIZE);
let data = TensorData::new(data, [SIZE, SIZE]);
Tensor::<Backend, 2, Int>::from_data(data, &device)
};
let m = mk_tensor();
let n = mk_tensor();
let _o = m.matmul(n);
}
- Run with e.g.
cargo test mre_int_matmul --no-default-features --features wgpu
Desktop (please complete the following information):
- OS: Linux x86_64 with nvidia GPU
Additional context
- I noticed that when
autotune
feat is enabled, this is performed via cmma which doesn't have this issue - I also tried to disable this validation in
crates/cubecl-matmul/src/components/stage/matmul/unit_partitioned/config.rs
and then matmul worked just fine
Limits returned from wgpu:
Limits {
max_texture_dimension_1d: 32768,
max_texture_dimension_2d: 32768,
max_texture_dimension_3d: 16384,
max_texture_array_layers: 2048,
max_bind_groups: 8,
max_bindings_per_bind_group: 1000,
max_dynamic_uniform_buffers_per_pipeline_layout: 15,
max_dynamic_storage_buffers_per_pipeline_layout: 16,
max_sampled_textures_per_shader_stage: 1048576,
max_samplers_per_shader_stage: 1048576,
max_storage_buffers_per_shader_stage: 1048576,
max_storage_textures_per_shader_stage: 1048576,
max_uniform_buffers_per_shader_stage: 1048576,
max_binding_array_elements_per_shader_stage: 1048576,
max_binding_array_sampler_elements_per_shader_stage: 1048576,
max_uniform_buffer_binding_size: 65536,
max_storage_buffer_binding_size: 2147483647,
max_vertex_buffers: 16,
max_buffer_size: 18446744073709551615,
max_vertex_attributes: 32,
max_vertex_buffer_array_stride: 2048,
min_uniform_buffer_offset_alignment: 64,
min_storage_buffer_offset_alignment: 32,
max_inter_stage_shader_components: 128,
max_color_attachments: 8,
max_color_attachment_bytes_per_sample: 128,
max_compute_workgroup_storage_size: 49152,
max_compute_invocations_per_workgroup: 1024,
max_compute_workgroup_size_x: 1024,
max_compute_workgroup_size_y: 1024,
max_compute_workgroup_size_z: 64,
max_compute_workgroups_per_dimension: 65535,
min_subgroup_size: 32,
max_subgroup_size: 32,
max_push_constant_size: 256,
max_non_sampler_bindings: 4294967295,
}
Variables in scope of the problematic matmul validation:
self.tiling_scheme.elements_in_stage_mk() = 4096
self.num_stages.lhs = 1
self.tiling_scheme.elements_in_stage_nk() = 2048
self.num_stages.rhs = 1
self.tiling_scheme.elements_in_tile_mn() = 16
num_units = 256
lhs_s_size = 8
lhs_smem_size = 4096
rhs_s_size = 8
rhs_smem_size = 2048
eo_size = 8
out_smem_size = 4096
smem_total_size = 81920
smem_limit = 49152
Metadata
Metadata
Assignees
Labels
No labels