Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DCNv4 ERROR] cuda op error when use internimage-L and internimage-LX with DCNv4, however internimage-B works well with DCNv4 #280

Open
dou3516 opened this issue Feb 1, 2024 · 2 comments

Comments

@dou3516
Copy link

dou3516 commented Feb 1, 2024

cuda op error when use internimage-L and internimage-LX with DCNv4, however internimage-B works well with DCNv4. What is wrong?

Environments:
DCNv4: build from https://github.com/OpenGVLab/DCNv4/tree/main/DCNv4_op/make.sh
DCNv3: build from https://github.com/OpenGVLab/InternImage/tree/master/segmentation/ops_dcnv3/make.sh

internimage-L config:

    backbone=dict(
        _delete_=True,
        type='InternImage',
        core_op='DCNv3',
        channels=160,
        depths=[5, 5, 22, 5],
        groups=[10, 20, 40, 80],
        mlp_ratio=4.,
        drop_path_rate=0.5, 
        norm_layer='LN',
        layer_scale=1.0,
        offset_scale=2.0,
        post_norm=True,
        with_cp=False,
        out_indices=(0, 1, 2, 3),
        dcn_output_bias=True,  # dcnv4
        mlp_fc2_bias=True,  # dcnv4
        dw_kernel_size=3,  # dcnv4
        use_dcn_v4_op=use_dcn_v4_op,  # dcnv4
        init_cfg=dict(type='Pretrained', checkpoint=pretrained)),

error log:

error in dcnv4_im2col_cuda: invalid configuration argument
launch arguments: gridDim=(1568, 1, 1), blockDim=(16, 80, 1), shm_size=5760
...
...
  File "/home/miniconda3/envs/dcnv4/lib/python3.9/site-packages/DCNv4-1.0.0.post2-py3.9-linux-x86_64.egg/DCNv4/functions/dcnv4_func.py", line 125, in backward
    ext.dcnv4_backward(*args)
RuntimeError: falseINTERNAL ASSERT FAILED at "/home/dbc/AIcode/DL/SS/mmsegmentation-dev1.x/DCNv4_op/src/cuda/dcnv4_col2im_cuda.cuh":470, please report a bug to PyTorch. kernel launch error
@zhiqi-li
Copy link
Contributor

Hi, what the shape of your input tensor? Since DCNv4 utilizes share memory to store tensors, tensors with extremely large shape will cause errors.

@dou3516
Copy link
Author

dou3516 commented Mar 26, 2024

Hi, what the shape of your input tensor? Since DCNv4 utilizes share memory to store tensors, tensors with extremely large shape will cause errors.

B x C x H x W = 8 x 3 x 448 x 448

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants