Skip to content

[BUG] CUDA crash with tile_reduce in debug mode #943

@Kenny-Vilella

Description

@Kenny-Vilella

Bug Description

Please see below for a repro.
The issue appears somewhere between 1.9.0.dev20250812 and 1.9.0.dev20250813.
I checked that it works well for 1.9.0.dev20250812 and earlier but not with 1.9.0.dev20250813 and after (including 1.10).
The issue disappears if I do not import mujoco_warp or If I do not call tile_reduce.
I added enable_backward = False to make sure we are using exactly the same kernel with and without importing mujoco_warp but this option does not seem to make a difference.

It may be an option problem with set by mujoco_warp ?

Repro:

import warp as wp
# Removing this stop the issue
import mujoco_warp as mjwarp

def _repro_tile_reduce(nv: int):
  @wp.kernel
  def repro_tile_reduce(
    qvel_in: wp.array2d(dtype=float),
    Mqvel: wp.array2d(dtype=float),
  ):
    worldid = wp.tid()

    qvel_tile = wp.tile_load(qvel_in[worldid], shape=wp.static(nv))
    Mqvel_tile = wp.tile_load(Mqvel[worldid], shape=wp.static(nv))
    qvelMqvel_tile = wp.tile_map(wp.mul, qvel_tile, Mqvel_tile)
    # Below is causing issue
    quadratic_tile = wp.tile_reduce(wp.add, qvelMqvel_tile)

  return repro_tile_reduce


wp.config.mode = "debug"
wp.config.enable_backward = False
wp.init()

qvel=wp.zeros((1, 27), dtype=float)
mv = wp.zeros((1, 27), dtype=float)

wp.launch_tiled(
  _repro_tile_reduce(27),
  dim=(1),
  inputs=[qvel, mv],
  block_dim=32,
)

System Information

Run on Horde:

Warp 1.9.0.dev20250813 initialized:
Git commit: f38f51c
CUDA Toolkit 12.8, Driver 12.4
Devices:
"cpu" : "x86_64"
"cuda:0" : "NVIDIA L40" (47 GiB, sm_89, mempool enabled)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions