Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning custom data always 0% #321

Open
aled96 opened this issue Sep 29, 2023 · 8 comments
Open

Learning custom data always 0% #321

aled96 opened this issue Sep 29, 2023 · 8 comments

Comments

@aled96
Copy link

aled96 commented Sep 29, 2023

I have defined an object, as the Ketchup one. I have a generated 1000 images with the following command:

python single_video_pybullet.py --nb_frames 1000 --scale 0.015 --path_single_obj ~/Deep_Object_Pose/scripts/nvisii_data_gen/models/iros_block/google_16k/textured_simple.obj --nb_distractors 0 --nb_object 5

And I was able to obtain the 1000 images with n object like this:
00962

Then, I tried to use the train with the aforementioned set of images with:

python -m torch.distributed.launch --nproc_per_node=1 train.py --network dope --epochs 25 --batchsize 10 --outf tmp/ --data ../nvisii_data_gen/output/output_example/

I tried with different epochs, batchsize and generating more times the set of images, howver I obtain always 0% for each epoch:

Screenshot from 2023-09-29 18-34-25

I am kind of new with learning so I do not know in deep the details, what I am doing wrong ?
In the csv files inside output folder, there are no data, only the header. In addition, I add the flag --save I have no results.

Thank you !

@TontonTremblay
Copy link
Collaborator

Let it train to epoch 100. And also check the output on in the tensorboard.

tensorboard --logdir /path/to/experiment/

Then you open chrome/firefox to the localhost and check the image tab. Check some other issues here to see what sort of output you should get.

@aled96
Copy link
Author

aled96 commented Oct 3, 2023

I tried to do it, however, I still have 0% for each epoch.

I also tried to use a reduced dataset of 5 images.

From tensorboard I get the following info:

The second epoch is the following:
image

After more than 50 epochs I have:

image

image

@TontonTremblay
Copy link
Collaborator

lower the learning rate a tad. The 0% is about the data it loads, not the perf. Sorry. I should update this. Can you try on a single image? Normally I test this first.

@aled96
Copy link
Author

aled96 commented Oct 3, 2023

I did a test with lr=0.00001, one image only, 100 epochs and batch size to 2.

Results in the end:
image
image

@TontonTremblay
Copy link
Collaborator

The train belief guess should look like the gt above it, can you run it for longer. run it for like 1000 epochs.

@aled96
Copy link
Author

aled96 commented Oct 3, 2023

I have changed the background of the input image and added symmetry information and trained on the following image:

00002

I run it for 1000 epochs and in the end the result was the following:

image
image

It seems much better! Do you think that now I can train on a bigger dataset with more instances of objects/distractors?

@TontonTremblay
Copy link
Collaborator

Are you aware of the symmetries in your object? Check the generating data with symmetries. But yeah this looks good now. DOPE takes a while to train, so you will have to patient, like on a 60k image dataset I train for ~30 epochs.

@aled96
Copy link
Author

aled96 commented Oct 3, 2023

I will adjust everything and try to run with more images if the PC allows me. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants