-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference Issue #4
Comments
Hello, as mentioned in Guide in README.EN.md, please make sure that the width and height of the input image are both multiples of 8. If not, please pad. Hope helpful. |
Can we use DPM solver true for inference for better results as we are not getting good results as mentioned in your paper also can you tell me how many steps the model is trained and whose pre-trained model weighs are provided in the Git repo |
Our DocDiff model was trained with 100 time steps. Using DPM solver does not improve the results. Currently, we have not uploaded the jump-step sampling based on DDIM (but you can modify it based on the paper; just save x_0 for each step in the sampling process). Hence, please use 100 steps for inference, otherwise it may result in strange outputs due to the incompatibility between noise intensity and T. As for not getting the expected results as described in the paper, we suspect that it might be caused by input data. We will upload some demo images and inference demo notebooks soon for easy result reproduction. If you still cannot obtain the desired results, you can download the deblurring dataset from http://www.fit.vutbr.cz/~ihradis/CNN-Deblur/ and simply split the images into original and ground truth at 128*128 resolution with a batchsize of 32 for training (or train on your own datasets). This only requires 12GB GPU memory and takes only one and a half days to iterate for 1 million steps on a 3090 GPU. We hope this helps. |
Moreover, during training, you can track the current performance of your model with the images located in the "./Training" folder. Usually, at around 10,000 steps, the model starts producing reasonable outputs. After 100,000 steps, the performance becomes more stable, and the model converges at around 1 million steps. |
I am using 100 steps for inference, can you please upload the inference notebook ASAP and also demo images |
For sure |
Do I need to change any thing in the config file for a better result and I am using exactly same code as mentioned in Git repo |
The inference notebook has been uploaded. If useful, please give it a star. Thank you. |
No need change |
I am certain that the issue is due to the pre-trained weights I provided being trained on the deblurring dataset (http://www.fit.vutbr.cz/~ihradis/CNN-Deblur/). This dataset contains images with mostly pure white backgrounds and black text, which has a significant difference in pixel distribution compared to the test samples you provided (which have a grayish color). Furthermore, I did not perform any color data augmentation during training, leading to the results you provided. I suggest two solutions:
|
You can refer to the method mentioned in the ninth page of the paper "Convolutional Neural Networks for Direct Text Deblurring" (https://www.fit.vut.cz/research/publication-file/10922/hradis15CNNdeblurring.pdf) for real photo testing. |
I want to resume my training from your given checkpoint can you please tell me the learning rate and config details I am currently using below-mentioned config modelIMAGE_SIZE : [304, 304] # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE. CHANNEL_X : 3 # input channel MODE : 1 # 1 Train, 0 Test trainPATH_GT : '' # path of ground truth |
The default config is suitable for document scenario. No need change. The training will be very stable. |
should I change the LR from 0.0001 to 1e-5 or something lower? |
1e-4 is ok. Lower means longer training time and performance will not be better. |
I resumed the training with the above config with my custom data for the 281218 steps but I didn't get the required result so how long do I need to train the model |
Take a look at the output image of the training process |
I will try to train it from scratch but can you please help in writing a separate module or list of augmentation for these type of images so that model will predict them good |
I believe that it may be helpful for you to analyze the characteristics of your sample and consider designing additional modules to improve the overall outcome. This process typically involves a significant amount of trial and error in order to achieve the desired results. |
I don't think so this is some kind of compression that was used by the WhatsApp to send the image can you please help urgently |
Sure, I will do my best to assist you as soon as possible. |
Any update regarding it |
This issue is coming during inference phase of this model for every image
File "/home/mepluser1/rahul_hanot/try_new/DocDiff/model/DocDiff.py", line 315, in forward
x = torch.cat((x, s), dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 76 but got size 75 for tensor number 1 in the list.
I
The text was updated successfully, but these errors were encountered: