Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN after few steps loss #1

Open
esdrascosta opened this issue Dec 4, 2020 · 4 comments
Open

NaN after few steps loss #1

esdrascosta opened this issue Dec 4, 2020 · 4 comments

Comments

@esdrascosta
Copy link

Hi, thanks for posting these codes.
I'm trying to replicate the results, but I'm getting NaN after 11 steps.
I installed all the dependencies in the described versions, but I still have these results.
Please find below the log results

$ python train.py --obj zipper --data_path ./data/mvtech_anomaly --batch_size 2  

{'alpha': 1.0, 'batch_size': 2, 'belta': 1.0, 'data_path': ' ./data/mvtech_anomaly', 'data_type': 'mvtec', 'epochs': 300, 'gamma': 1.0, 'grayscale': False, 'img_size': 256, 'input_channel': 3, 'k_value': [2, 4, 8, 16], 'lr': 0.0001, 'obj': 'zipper', 'prefix': '2020-12-03-1197', 'save_dir': './mvtec/zipper/seed_2988/', 'seed': 2988, 'validation_ratio': 0.2, 'weight_decay': 1e-05}
   1/300 ----- [[2020-12-03 23:30:45]] [Need: 00:00:00]
  0%|                                                                                                         | 0/96 [00:00<?, ?it/s]Step Loss: 1.779465
  1%|█                                                                                                | 1/96 [00:02<03:22,  2.13s/it]Step Loss: 1.835103
  2%|██                                                                                               | 2/96 [00:03<02:52,  1.83s/it]Step Loss: 1.479402
  3%|███                                                                                              | 3/96 [00:04<02:36,  1.69s/it]Step Loss: 1.401773
  4%|████                                                                                             | 4/96 [00:05<02:26,  1.59s/it]Step Loss: 1.448756
  5%|█████                                                                                            | 5/96 [00:07<02:13,  1.46s/it]Step Loss: 1.693701
  6%|██████                                                                                           | 6/96 [00:08<02:02,  1.36s/it]Step Loss: 1.229446
  7%|███████                                                                                          | 7/96 [00:09<02:00,  1.36s/it]Step Loss: 1.215524
  8%|████████                                                                                         | 8/96 [00:10<02:00,  1.36s/it]Step Loss: 1.493567
  9%|█████████                                                                                        | 9/96 [00:12<01:52,  1.29s/it]Step Loss: 1.430892
 10%|██████████                                                                                      | 10/96 [00:13<01:46,  1.24s/it]Step Loss: 1.118710
 11%|███████████                                                                                     | 11/96 [00:14<01:48,  1.28s/it]Step Loss: nan
 12%|████████████                                                                                    | 12/96 [00:15<01:43,  1.23s/it]Step Loss: nan
 14%|█████████████                                                                                   | 13/96 [00:16<01:41,  1.22s/it]
@plutoyuxie
Copy link
Owner

@esdrascosta
Same results occur time to time, and we are trying to find out the reason too.
Maybe you can just have a cup of coffee and simply try it again at present.

@MDAooo
Copy link

MDAooo commented Dec 8, 2020

Hi @plutoyuxie,
Thanks for sharing your codes. I was also working on the implementation of RIAD. Your codes are great and helped me a lot. I really appreciate that.

Hi @esdrascosta ,
I met this problems before. I fixed it by modifying 'x = torch.sqrt(x + sys.float_info.epsilon)' at line 27 in gms_loss.py, then I've never met the NaN loss again.
I think the problem is caused by 0 value when calculating the derivative.
You can try this modification. I hope it helps.

BTW, have you ever tried to train ONE reconstruction model for multiple objects? I am trying this but the reconstruction results is not as good as single object.

@plutoyuxie
Copy link
Owner

Thanks, @MaDongao
I will try it soon.
Reconstructing multiple objects is much harder. As I know, the state-of-the-art method is called PaDiM, which is not a reconstruction method.

@taikiinoue45
Copy link

taikiinoue45 commented Jan 19, 2021

@plutoyuxie
The following library might be helpful for your implementation.
https://github.com/photosynthesis-team/piq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants