This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Problems related to use image_classification.py to train model #19539
Unanswered
Johnny-dai-git
asked this question in
General
Replies: 2 comments
-
@Johnny-dai-git a general note for image classification, I think you may benefit from GluonCV's training scripts, which incorporates many tricks for training better models. Also, if you have the available hardware, multi-GPU data parallel training is usually the most efficient mode for this problem size. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thank you so much.
…On Wed, Nov 18, 2020 at 12:54 PM Sheng Zha ***@***.***> wrote:
@Johnny-dai-git <https://github.com/Johnny-dai-git> a general note for
image classification, I think you may benefit from GluonCV's training
scripts <https://cv.gluon.ai/model_zoo/classification.html>, which
incorporates many tricks for training better models.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19539 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDGLB25DNFDGNCHW2UQ7ALSQQC5VANCNFSM4TXPDZXQ>
.
--
Yuanjun Dai (he/him)
P.hd
Department of Computer and Data Sciences
Case Western Reserve University
Phone: (216)-235-8330
Office: Glennan 505
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am working on distributed learning. I am using the python code image_classification.py provided. However, problems show up:
Firstly, I am trying to use caltech101 to train a model. However, the worker can download the data-set and data can be extracted. However, I have no idea why the whole program will hang in data.py line 107 and never return the training_path, testing_path to the train loop. No error message shows up. It just hangs forever.
Secondly, I am trying to minist dataset to train the VGG11 VGG16 or alexnt.But error message shows up:**_
12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 20000 images, shuffle=1, shape=[32,1,28,28]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 20000 images, shuffle=1, shape=[32,1,28,28]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 3333 images, shuffle=1, shape=[32,1,28,28]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 20000 images, shuffle=1, shape=[32,1,28,28]
terminate called after throwing an instance of 'dmlc::Error'
what(): [12:28:59] src/operator/nn/pooling.cc:190: Check failed: param.kernel[0] <= dshape_nchw[2] + 2 * param.pad[0]: kernel size (2) exceeds input (1 padded to 1)
Stack trace:
[bt] (0) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0x307d3b) [0x7f225736dd3b]
[bt] (1) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(+0xb811eb) [0x7f2257be71eb]
[bt] (2) /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, mxnet::DispatchMode*)+0x1d27) [0x7f225a507aa7]
[12:28:59] src/io/iter_mnist.cc:113: MNISTIter: load 3333 images, shuffle=1, shape=[32,1,28,28]
bash: line 1: 23110 Aborted python3 image_classification.py --dataset mnist --model vgg11 --epochs 1 --kvstore dist_async
Thirdly, I try to use imagenet to train the model, however, I need to pass parameters called --data-dir. What is it for?
After I looked into the source code. Does it seem that I need to download the imagenet dataset by myself and pass it to the workers?
Fourthly. Could you tell me which dataset will work on which model based on your image_classification.py?
Best Regards,
Johnny
Beta Was this translation helpful? Give feedback.
All reactions