Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: device-side assert triggered #369

Open
stealth0414 opened this issue May 26, 2023 · 2 comments
Open

CUDA error: device-side assert triggered #369

stealth0414 opened this issue May 26, 2023 · 2 comments

Comments

@stealth0414
Copy link

when i train my own model in epoch 100,it has error in loss so that stop training
/pytorch/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [1237,0,0], thread: [25,0,0] Assertion input_val >= zero && input_val <= one failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [1254,0,0], thread: [15,0,0] Assertion input_val >= zero && input_val <= one failed.
it looks like in l1_loss.py line 11
if mask_sum.item() == 0:
how to fix it plz,thanks

@goseesomething
Copy link

goseesomething commented Jul 10, 2024

try edit learning rate params. It's because the dataset is difference and the params is not the best conditions, that's why the calculation between mask and pred coming out NAN value

@MxxM-max
Copy link

请问您解决了吗,我也遇到同样的问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants