Problems related to use image_classification.py to train model #19539

Johnny-dai-git · 2020-11-16T17:41:10Z

Johnny-dai-git
Nov 16, 2020

Hi,

I am working on distributed learning. I am using the python code image_classification.py provided. However, problems show up:

Firstly, I am trying to use caltech101 to train a model. However, the worker can download the data-set and data can be extracted. However, I have no idea why the whole program will hang in data.py line 107 and never return the training_path, testing_path to the train loop. No error message shows up. It just hangs forever.

Secondly, I am trying to minist dataset to train the VGG11 VGG16 or alexnt.But error message shows up:**_

12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 20000 images, shuffle=1, shape=[32,1,28,28]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 20000 images, shuffle=1, shape=[32,1,28,28]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 3333 images, shuffle=1, shape=[32,1,28,28]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 20000 images, shuffle=1, shape=[32,1,28,28]
terminate called after throwing an instance of 'dmlc::Error'
what(): [12:28:59] src/operator/nn/pooling.cc:190: Check failed: param.kernel[0] <= dshape_nchw[2] + 2 * param.pad[0]: kernel size (2) exceeds input (1 padded to 1)
Stack trace:
[bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x307d3b) [0x7f225736dd3b]
[bt] (1) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0xb811eb) [0x7f2257be71eb]
[bt] (2) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, mxnet::DispatchMode*)+0x1d27) [0x7f225a507aa7]

[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 3333 images, shuffle=1, shape=[32,1,28,28]
bash: line 1: 23110 Aborted python3 image_classification.py --dataset mnist --model vgg11 --epochs 1 --kvstore dist_async

Thirdly, I try to use imagenet to train the model, however, I need to pass parameters called --data-dir. What is it for?
After I looked into the source code. Does it seem that I need to download the imagenet dataset by myself and pass it to the workers?

Fourthly. Could you tell me which dataset will work on which model based on your image_classification.py?

Best Regards,
Johnny

szha · 2020-11-18T17:54:19Z

szha
Nov 18, 2020
Collaborator

@Johnny-dai-git a general note for image classification, I think you may benefit from GluonCV's training scripts, which incorporates many tricks for training better models. Also, if you have the available hardware, multi-GPU data parallel training is usually the most efficient mode for this problem size.

0 replies

Johnny-dai-git · 2020-11-28T21:48:27Z

Johnny-dai-git
Nov 28, 2020
Author

Thank you so much.

…

On Wed, Nov 18, 2020 at 12:54 PM Sheng Zha ***@***.***> wrote: @Johnny-dai-git <https://github.com/Johnny-dai-git> a general note for image classification, I think you may benefit from GluonCV's training scripts <https://cv.gluon.ai/model_zoo/classification.html>, which incorporates many tricks for training better models. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19539 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANDGLB25DNFDGNCHW2UQ7ALSQQC5VANCNFSM4TXPDZXQ> .

-- Yuanjun Dai (he/him) P.hd Department of Computer and Data Sciences Case Western Reserve University Phone: (216)-235-8330 Office: Glennan 505

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems related to use image_classification.py to train model #19539

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Problems related to use image_classification.py to train model #19539

Johnny-dai-git Nov 16, 2020

Replies: 2 comments

szha Nov 18, 2020 Collaborator

Johnny-dai-git Nov 28, 2020 Author

Johnny-dai-git
Nov 16, 2020

szha
Nov 18, 2020
Collaborator

Johnny-dai-git
Nov 28, 2020
Author