Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cifar100 with resnet #113

Open
apeterswu opened this issue May 7, 2018 · 1 comment
Open

cifar100 with resnet #113

apeterswu opened this issue May 7, 2018 · 1 comment

Comments

@apeterswu
Copy link

apeterswu commented May 7, 2018

Hi,

I try to run the resnet-32 model on cifar-100 dataset, with only the difference of the training data in "Deep_Residual_Learning_CIFAR-10.py", but it causes the error like this:

Starting training...
Traceback (most recent call last):
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
RuntimeError: error getting worksize: CUDNN_STATUS_BAD_PARAM

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "resnet.py", line 390, in <module>
    main(**kwargs)
  File "resnet.py", line 319, in main
    train_err += train_fn(inputs, targets)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py", line 917, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
RuntimeError: error getting worksize: CUDNN_STATUS_BAD_PARAM
Apply node that caused the error: GpuDnnConv{algo='small', inplace=True, num_groups=1}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty{dtype='float32', context_name=None}.0, GpuDnnConvDesc{border_mode='half', subsample=(1, 1), dilation=(1, 1), conv_mode='cross', precision='float32', num_groups=1}.0, Constant{1.0}, Constant{0.0})
Toposort index: 399
Inputs types: [GpuArrayType<None>(float32, 4D), GpuArrayType<None>(float32, 4D), GpuArrayType<None>(float32, 4D), <theano.gof.type.CDataType object at 0x7fa464893a20>, Scalar(float32), Scalar(float32)]
Inputs shapes: [(128, 3, 32, 32), (16, 3, 3, 3), (128, 16, 32, 32), 'No shapes', (), ()]
Inputs strides: [(12288, 4096, 128, 4), (108, 36, 12, 4), (65536, 4096, 128, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL at 0x7fa3997c61e0>, 1.0, 0.0]
Outputs clients: [[GpuElemwise{sub,no_inplace}(GpuDnnConv{algo='small', inplace=True, num_groups=1}.0, InplaceGpuDimShuffle{x,0,x,x}.0), GpuContiguous(GpuDnnConv{algo='small', inplace=True, num_groups=1}.0), GpuElemwise{sub,no_inplace}(GpuDnnConv{algo='small', inplace=True, num_groups=1}.0, GpuElemwise{Composite{(((i0 / i1) / i2) / i3)}}[]<gpuarray>.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "resnet.py", line 390, in <module>
    main(**kwargs)
  File "resnet.py", line 267, in main
    prediction = lasagne.layers.get_output(network)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/lasagne/layers/helper.py", line 197, in get_output
    all_outputs[layer] = layer.get_output_for(layer_inputs, **kwargs)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/lasagne/layers/conv.py", line 352, in get_output_for
    conved = self.convolve(input, **kwargs)
  File "/home/changchen/anaconda3/lib/python3.6/site-packages/lasagne/layers/conv.py", line 650, in convolve
    **extra_kwargs)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Seems something wrong happend in the convolution operation. Could you please give any advice? Thanks a lot.

@f0k
Copy link
Member

f0k commented Jun 12, 2018

Does this also happen with the original CIFAR-10 code? Sometimes it helps enforcing a different cuDNN algorithm or letting it choose automatically using:

THEANO_FLAGS=dnn.conv.algo_fwd=guess_on_shape_change,dnn.conv.algo_bwd_data=guess_on_shape_change,dnn.conv.algo_bwd_filter=guess_on_shape_change python resnet.py

If this doesn't help, you can also disable cuDNN using THEANO_FLAGS=dnn.enabled=False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants