-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retrain distilled images with minibatch-SGD #36
Comments
The images may be optimized to jointly give some gradient. If you want to be order/batch agnostic, you can try modifying the distillation procedure to apply the images in randomly ordered batches. |
Thanks for your reply. May be I express my question in a wrong way. I aim to use the distilled images which have been generated to achieve the best testing performance (such as the MINIST distilled data which have achieved 96.54% accuracy) to retrain a model from scratch. |
You expressed well and I understood exactly what you meant. What I was saying is that if you want the images to be able to be applied in a certain way (e.g., randomly ordered and batched), it is best to modify the training to suit that, because they might overfit to the fixed ordering and batching used in training. Hence doing these randomly in training is also important. |
I understand what you said now. I'II try that and share the next results. Thanks a lot. |
Hey I am very interested in this work, and have some questions to ask.
I used 20 images per class in MINIST dataset-distillation by using
python main.py --mode distill_basic --dataset MNIST --arch LeNet \--distill_steps 1 --train_nets_type known_init --n_nets 1 \--test_nets_type same_as_train
and achieved 96.54 testing accuracy.
But when I use these distilled images as training data to retrain a same initial model as used in distillation step by minibatch-SGD, the testing accuracy dropped to 62% and the overfitting occurred. My question is
(1)Is it just because the different way of optimization?
(2)Why optimized the network in the way of yours can avoid overfitting even used only 1 sample per class in MINIST dataset-distillation?
(3)How to use distilled images to retrain a good model in normal training way such as minibatch-SGD?
The text was updated successfully, but these errors were encountered: