Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA Error encountered in <function Lib.bn_mean_var_cuda at 0x7f96e5e9d2f0> #97

Open
hustcc19860606 opened this issue Feb 1, 2021 · 2 comments

Comments

@hustcc19860606
Copy link

Hello, i use the command 'python3 train.py --data-dir ./dataset/cityscapes/ --random-mirror --random-scale --restore-from ./dataset/resnet101-imagenet.pth --gpu 4,5,6,7 --learning-rate 1e-2 --input-size 769,769 --weight-decay 1e-4 --batch-size 8 --num-steps 60000 --recurrence 2', and has some wrong as follows:
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.
warnings.warn(warning.format(ret))
481950 images are loaded!
Traceback (most recent call last):
File "train.py", line 245, in
main()
File "train.py", line 209, in main
preds = model(images, args.recurrence)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
raise output
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker
output = module(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/cc/networks/ccnet.py", line 196, in forward
x = self.relu1(self.bn1(self.conv1(x)))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/cc/libs/bn.py", line 184, in forward
self.activation, self.slope)
File "/home/cc/libs/functions.py", line 183, in forward
_check(_ext.bn_mean_var_cuda, x, mean, var)
File "/home/cc/libs/functions.py", line 16, in _check
raise RuntimeError("CUDA Error encountered in {}".format(fn))
RuntimeError: CUDA Error encountered in <function Lib.bn_mean_var_cuda at 0x7f96e5e9d2f0>
Can you give some suggestions?
@speedinghzl @honghuis

@hustcc19860606
Copy link
Author

And I follow your readme to compile Inplace-abn and criss-cross attention:
root@a55bfbbee40c:/home/cc# cd libs
root@a55bfbbee40c:/home/cc/libs# sh build.sh
root@a55bfbbee40c:/home/cc/libs# python3 build.py
generating /tmp/tmpcqj20vvw/__ext.c
setting the current directory to '/tmp/tmpcqj20vvw'
running build_ext
building '__ext' extension
creating home
creating home/cc
creating home/cc/libs
creating home/cc/libs/src
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c __ext.c -o ./__ext.o -std=c99 -std=c++11
cc1: warning: command line option '-std=c++11' is valid for C++/ObjC++ but not for C
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c /home/cc/libs/src/lib_cffi.cpp -o ./home/cc/libs/src/lib_cffi.o -std=c99 -std=c++11
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
cc1plus: warning: command line option '-std=c99' is valid for C/ObjC but not for C++
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 ./__ext.o ./home/cc/libs/src/lib_cffi.o /home/cc/libs/src/bn.o -o ./__ext.so
root@a55bfbbee40c:/home/cc/libs# cd ../cc_attention
root@a55bfbbee40c:/home/cc/cc_attention# sh build.sh
root@a55bfbbee40c:/home/cc/cc_attention# python3 build.py
generating /tmp/tmpbc6m099s/__ext.c
setting the current directory to '/tmp/tmpbc6m099s'
running build_ext
building '__ext' extension
creating home
creating home/cc
creating home/cc/cc_attention
creating home/cc/cc_attention/src
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c __ext.c -o ./__ext.o -std=c99 -std=c++11
cc1: warning: command line option '-std=c++11' is valid for C++/ObjC++ but not for C
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c /home/cc/cc_attention/src/lib_cffi.cpp -o ./home/cc/cc_attention/src/lib_cffi.o -std=c99 -std=c++11
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
cc1plus: warning: command line option '-std=c99' is valid for C/ObjC but not for C++
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 ./__ext.o ./home/cc/cc_attention/src/lib_cffi.o /home/cc/cc_attention/src/ca.o -o ./__ext.so
Is it right?

@speedinghzl
Copy link
Owner

Hi @hustcc19860606 Maybe the pure-python or >Pytorch 1.5 could solve your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants