RuntimeError: CUDA Error encountered in <function Lib.bn_mean_var_cuda at 0x7f96e5e9d2f0> #97

hustcc19860606 · 2021-02-01T12:19:13Z

Hello, i use the command 'python3 train.py --data-dir ./dataset/cityscapes/ --random-mirror --random-scale --restore-from ./dataset/resnet101-imagenet.pth --gpu 4,5,6,7 --learning-rate 1e-2 --input-size 769,769 --weight-decay 1e-4 --batch-size 8 --num-steps 60000 --recurrence 2', and has some wrong as follows:
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.
warnings.warn(warning.format(ret))
481950 images are loaded!
Traceback (most recent call last):
File "train.py", line 245, in
main()
File "train.py", line 209, in main
preds = model(images, args.recurrence)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
raise output
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker
output = module(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/cc/networks/ccnet.py", line 196, in forward
x = self.relu1(self.bn1(self.conv1(x)))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/cc/libs/bn.py", line 184, in forward
self.activation, self.slope)
File "/home/cc/libs/functions.py", line 183, in forward
_check(_ext.bn_mean_var_cuda, x, mean, var)
File "/home/cc/libs/functions.py", line 16, in _check
raise RuntimeError("CUDA Error encountered in {}".format(fn))
RuntimeError: CUDA Error encountered in <function Lib.bn_mean_var_cuda at 0x7f96e5e9d2f0>
Can you give some suggestions?
@speedinghzl @honghuis

hustcc19860606 · 2021-02-01T12:25:55Z

And I follow your readme to compile Inplace-abn and criss-cross attention:
root@a55bfbbee40c:/home/cc# cd libs
root@a55bfbbee40c:/home/cc/libs# sh build.sh
root@a55bfbbee40c:/home/cc/libs# python3 build.py
generating /tmp/tmpcqj20vvw/__ext.c
setting the current directory to '/tmp/tmpcqj20vvw'
running build_ext
building '__ext' extension
creating home
creating home/cc
creating home/cc/libs
creating home/cc/libs/src
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c __ext.c -o ./__ext.o -std=c99 -std=c++11
cc1: warning: command line option '-std=c++11' is valid for C++/ObjC++ but not for C
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c /home/cc/libs/src/lib_cffi.cpp -o ./home/cc/libs/src/lib_cffi.o -std=c99 -std=c++11
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
cc1plus: warning: command line option '-std=c99' is valid for C/ObjC but not for C++
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 ./__ext.o ./home/cc/libs/src/lib_cffi.o /home/cc/libs/src/bn.o -o ./__ext.so
root@a55bfbbee40c:/home/cc/libs# cd ../cc_attention
root@a55bfbbee40c:/home/cc/cc_attention# sh build.sh
root@a55bfbbee40c:/home/cc/cc_attention# python3 build.py
generating /tmp/tmpbc6m099s/__ext.c
setting the current directory to '/tmp/tmpbc6m099s'
running build_ext
building '__ext' extension
creating home
creating home/cc
creating home/cc/cc_attention
creating home/cc/cc_attention/src
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c __ext.c -o ./__ext.o -std=c99 -std=c++11
cc1: warning: command line option '-std=c++11' is valid for C++/ObjC++ but not for C
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/TH -I/usr/local/lib/python3.6/dist-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -c /home/cc/cc_attention/src/lib_cffi.cpp -o ./home/cc/cc_attention/src/lib_cffi.o -std=c99 -std=c++11
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
cc1plus: warning: command line option '-std=c99' is valid for C/ObjC but not for C++
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 ./__ext.o ./home/cc/cc_attention/src/lib_cffi.o /home/cc/cc_attention/src/ca.o -o ./__ext.so
Is it right?

speedinghzl · 2021-02-07T02:54:27Z

Hi @hustcc19860606 Maybe the pure-python or >Pytorch 1.5 could solve your problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA Error encountered in <function Lib.bn_mean_var_cuda at 0x7f96e5e9d2f0> #97

RuntimeError: CUDA Error encountered in <function Lib.bn_mean_var_cuda at 0x7f96e5e9d2f0> #97

hustcc19860606 commented Feb 1, 2021

hustcc19860606 commented Feb 1, 2021

speedinghzl commented Feb 7, 2021

RuntimeError: CUDA Error encountered in <function Lib.bn_mean_var_cuda at 0x7f96e5e9d2f0> #97

RuntimeError: CUDA Error encountered in <function Lib.bn_mean_var_cuda at 0x7f96e5e9d2f0> #97

Comments

hustcc19860606 commented Feb 1, 2021

hustcc19860606 commented Feb 1, 2021

speedinghzl commented Feb 7, 2021