Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference Issue #4

Open
rahulhanotDTU opened this issue Jun 10, 2023 · 29 comments
Open

Inference Issue #4

rahulhanotDTU opened this issue Jun 10, 2023 · 29 comments
Labels
help wanted Extra attention is needed

Comments

@rahulhanotDTU
Copy link

This issue is coming during inference phase of this model for every image

File "/home/mepluser1/rahul_hanot/try_new/DocDiff/model/DocDiff.py", line 315, in forward
x = torch.cat((x, s), dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 76 but got size 75 for tensor number 1 in the list.
I

@Royalvice
Copy link
Owner

Hello, as mentioned in Guide in README.EN.md, please make sure that the width and height of the input image are both multiples of 8. If not, please pad. Hope helpful.

@Royalvice Royalvice reopened this Jun 10, 2023
@rahulhanotDTU
Copy link
Author

rahulhanotDTU commented Jun 12, 2023

Can we use DPM solver true for inference for better results as we are not getting good results as mentioned in your paper also can you tell me how many steps the model is trained and whose pre-trained model weighs are provided in the Git repo

@Royalvice
Copy link
Owner

Our DocDiff model was trained with 100 time steps. Using DPM solver does not improve the results. Currently, we have not uploaded the jump-step sampling based on DDIM (but you can modify it based on the paper; just save x_0 for each step in the sampling process). Hence, please use 100 steps for inference, otherwise it may result in strange outputs due to the incompatibility between noise intensity and T. As for not getting the expected results as described in the paper, we suspect that it might be caused by input data. We will upload some demo images and inference demo notebooks soon for easy result reproduction. If you still cannot obtain the desired results, you can download the deblurring dataset from http://www.fit.vutbr.cz/~ihradis/CNN-Deblur/ and simply split the images into original and ground truth at 128*128 resolution with a batchsize of 32 for training (or train on your own datasets). This only requires 12GB GPU memory and takes only one and a half days to iterate for 1 million steps on a 3090 GPU. We hope this helps.

@Royalvice
Copy link
Owner

Moreover, during training, you can track the current performance of your model with the images located in the "./Training" folder. Usually, at around 10,000 steps, the model starts producing reasonable outputs. After 100,000 steps, the performance becomes more stable, and the model converges at around 1 million steps.

@rahulhanotDTU
Copy link
Author

I am using 100 steps for inference, can you please upload the inference notebook ASAP and also demo images

@Royalvice
Copy link
Owner

For sure

@rahulhanotDTU
Copy link
Author

Do I need to change any thing in the config file for a better result and I am using exactly same code as mentioned in Git repo
TIMESTEPS : 100
NATIVE_RESOLUTION : 'False' # if True, test with native resolution
DPM_SOLVER : 'False' # if True, test with DPM_solver
DPM_STEP : 20 # DPM_solver step
BATCH_SIZE_VAL : 1 # test batch size
TEST_PATH_GT : '/home/mepluser1/rahul_hanot/new_project_rahul/data/gt_png' # path of ground truth
TEST_PATH_IMG : '/home/mepluser1/rahul_hanot/new_project_rahul/data/input_data' # path of input
TEST_INITIAL_PREDICTOR_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/init_predictor_document_deblurring.pth' # path of initial predictor
TEST_DENOISER_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/denoiser_document_deblurring.pth' # path of denoiser
TEST_IMG_SAVE_PATH : './results_4' # path to save results

@Royalvice
Copy link
Owner

The inference notebook has been uploaded. If useful, please give it a star. Thank you.

@Royalvice
Copy link
Owner

Do I need to change any thing in the config file for a better result and I am using exactly same code as mentioned in Git repo TIMESTEPS : 100 NATIVE_RESOLUTION : 'False' # if True, test with native resolution DPM_SOLVER : 'False' # if True, test with DPM_solver DPM_STEP : 20 # DPM_solver step BATCH_SIZE_VAL : 1 # test batch size TEST_PATH_GT : '/home/mepluser1/rahul_hanot/new_project_rahul/data/gt_png' # path of ground truth TEST_PATH_IMG : '/home/mepluser1/rahul_hanot/new_project_rahul/data/input_data' # path of input TEST_INITIAL_PREDICTOR_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/init_predictor_document_deblurring.pth' # path of initial predictor TEST_DENOISER_WEIGHT_PATH : '/home/mepluser1/rahul_hanot/DocDiff/model_weight/denoiser_document_deblurring.pth' # path of denoiser TEST_IMG_SAVE_PATH : './results_4' # path to save results

No need change

@rahulhanotDTU
Copy link
Author

when I am testing on your demo image it is working fine but when I test it on my image it shows very bad results can you please help in that and tell what is wrong with the process and how should i correct it
res1_nonative

@Royalvice
Copy link
Owner

I am certain that the issue is due to the pre-trained weights I provided being trained on the deblurring dataset (http://www.fit.vutbr.cz/~ihradis/CNN-Deblur/). This dataset contains images with mostly pure white backgrounds and black text, which has a significant difference in pixel distribution compared to the test samples you provided (which have a grayish color). Furthermore, I did not perform any color data augmentation during training, leading to the results you provided.

I suggest two solutions:

  1. If you have a large number of similar samples (over 500), fine-tune the pre-trained weights on them.
  2. Perform color data augmentation on the deblurring dataset (http://www.fit.vutbr.cz/~ihradis/CNN-Deblur/) to change the grayscale distribution of the training samples.

@Royalvice
Copy link
Owner

when I am testing on your demo image it is working fine but when I test it on my image it shows very bad results can you please help in that and tell what is wrong with the process and how should i correct it res1_nonative

You can refer to the method mentioned in the ninth page of the paper "Convolutional Neural Networks for Direct Text Deblurring" (https://www.fit.vut.cz/research/publication-file/10922/hradis15CNNdeblurring.pdf) for real photo testing.

@Royalvice Royalvice added the help wanted Extra attention is needed label Jun 13, 2023
@rahulhanotDTU
Copy link
Author

I want to resume my training from your given checkpoint can you please tell me the learning rate and config details I am currently using below-mentioned config

model

IMAGE_SIZE : [304, 304] # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.

CHANNEL_X : 3 # input channel
CHANNEL_Y : 3 # output channel
TIMESTEPS : 100 # diffusion steps
SCHEDULE : 'linear' # linear or cosine
MODEL_CHANNELS : 32 # basic channels of Unet
NUM_RESBLOCKS : 1 # number of residual blocks
CHANNEL_MULT : [1,2,3,4] # channel multiplier of each layer
NUM_HEADS : 1

MODE : 1 # 1 Train, 0 Test
PRE_ORI : 'True' # if True, predict $x_0$, else predict $\epsilon$.

train

PATH_GT : '' # path of ground truth
PATH_IMG : '' # path of input
BATCH_SIZE : 4 # training batch size
NUM_WORKERS : 2 # number of workers
ITERATION_MAX : 1000000 # max training iteration
LR : 0.0001 # learning rate
LOSS : 'L2' # L1 or L2
EMA_EVERY : 100 # update EMA every EMA_EVERY iterations
START_EMA : 2000 # start EMA after START_EMA iterations
SAVE_MODEL_EVERY : 10000 # save model every SAVE_MODEL_EVERY iterations
EMA: 'True' # if True, use EMA
CONTINUE_TRAINING : 'True' # if True, continue training
CONTINUE_TRAINING_STEPS : 10000 # continue training from CONTINUE_TRAINING_STEPS
PRETRAINED_PATH_INITIAL_PREDICTOR : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/init.pth' # path of pretrained initial predictor
PRETRAINED_PATH_DENOISER : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/denoiser.pth' # path of pretrained denoiser
WEIGHT_SAVE_PATH : './checksave' # path to save model
TRAINING_PATH : './Training' # path of training data
BETA_LOSS : 50 # hyperparameter to balance the pixel loss and the diffusion loss
HIGH_LOW_FREQ : 'True' # if True, training with frequency separation

@Royalvice
Copy link
Owner

I want to resume my training from your given checkpoint can you please tell me the learning rate and config details I am currently using below-mentioned config

model

IMAGE_SIZE : [304, 304] # load image size, if it's train mode, it will be randomly cropped to IMAGE_SIZE. If it's test mode, it will be resized to IMAGE_SIZE.

CHANNEL_X : 3 # input channel CHANNEL_Y : 3 # output channel TIMESTEPS : 100 # diffusion steps SCHEDULE : 'linear' # linear or cosine MODEL_CHANNELS : 32 # basic channels of Unet NUM_RESBLOCKS : 1 # number of residual blocks CHANNEL_MULT : [1,2,3,4] # channel multiplier of each layer NUM_HEADS : 1

MODE : 1 # 1 Train, 0 Test PRE_ORI : 'True' # if True, predict x0, else predict ϵ.

train

PATH_GT : '' # path of ground truth PATH_IMG : '' # path of input BATCH_SIZE : 4 # training batch size NUM_WORKERS : 2 # number of workers ITERATION_MAX : 1000000 # max training iteration LR : 0.0001 # learning rate LOSS : 'L2' # L1 or L2 EMA_EVERY : 100 # update EMA every EMA_EVERY iterations START_EMA : 2000 # start EMA after START_EMA iterations SAVE_MODEL_EVERY : 10000 # save model every SAVE_MODEL_EVERY iterations EMA: 'True' # if True, use EMA CONTINUE_TRAINING : 'True' # if True, continue training CONTINUE_TRAINING_STEPS : 10000 # continue training from CONTINUE_TRAINING_STEPS PRETRAINED_PATH_INITIAL_PREDICTOR : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/init.pth' # path of pretrained initial predictor PRETRAINED_PATH_DENOISER : '/home/mepluser1/rahul_hanot/try_new_1/DocDiff/checksave/denoiser.pth' # path of pretrained denoiser WEIGHT_SAVE_PATH : './checksave' # path to save model TRAINING_PATH : './Training' # path of training data BETA_LOSS : 50 # hyperparameter to balance the pixel loss and the diffusion loss HIGH_LOW_FREQ : 'True' # if True, training with frequency separation

The default config is suitable for document scenario. No need change. The training will be very stable.

@rahulhanotDTU
Copy link
Author

should I change the LR from 0.0001 to 1e-5 or something lower?

@Royalvice
Copy link
Owner

should I change the LR from 0.0001 to 1e-5 or something lower?

1e-4 is ok. Lower means longer training time and performance will not be better.

@rahulhanotDTU
Copy link
Author

I resumed the training with the above config with my custom data for the 281218 steps but I didn't get the required result so how long do I need to train the model

@Royalvice
Copy link
Owner

I resumed the training with the above config with my custom data for the 281218 steps but I didn't get the required result so how long do I need to train the model

Take a look at the output image of the training process

@rahulhanotDTU
Copy link
Author

I trained the model for 7.5L but the results I am getting are uploaded below and the models weight is model_denoiser_450000.pth and model_init_450000.pth can you please tell why I am getting this type of result and how can improve them and your demo inference code and I am using
1277_resul

@Royalvice Royalvice reopened this Jun 16, 2023
@Royalvice
Copy link
Owner

I trained the model for 7.5L but the results I am getting are uploaded below and the models weight is model_denoiser_450000.pth and model_init_450000.pth can you please tell why I am getting this type of result and how can improve them and your demo inference code and I am using
1277_resul

From the image you provided, this is not a simple deblurring task. The degraded document image you provided is not only blurry, but the strokes of the characters are also very faint. Therefore, your task is more difficult. My suggestion is to enhance the contrast of the input image to increase the intensity of the faint strokes.

@Royalvice
Copy link
Owner

I trained the model for 7.5L but the results I am getting are uploaded below and the models weight is model_denoiser_450000.pth and model_init_450000.pth can you please tell why I am getting this type of result and how can improve them and your demo inference code and I am using
1277_resul

It's best to train from scratch. If the results are still poor, you may need to design some additional modules to extract features.

@rahulhanotDTU
Copy link
Author

I will try to train it from scratch but can you please help in writing a separate module or list of augmentation for these type of images so that model will predict them good

@Royalvice
Copy link
Owner

I will try to train it from scratch but can you please help in writing a separate module or list of augmentation for these type of images so that model will predict them good

I believe that it may be helpful for you to analyze the characteristics of your sample and consider designing additional modules to improve the overall outcome. This process typically involves a significant amount of trial and error in order to achieve the desired results.

@rahulhanotDTU
Copy link
Author

I want to denoise the noise that is created by WhatsApp compression but I am unable to create that type of noise can you please help in creating this type of noise for text image
d1

@Royalvice
Copy link
Owner

I want to denoise the noise that is created by WhatsApp compression but I am unable to create that type of noise can you please help in creating this type of noise for text image d1

This is salt-and-pepper noise, I might need to research how to synthesize it, but it's already on my to-do list.

@Royalvice Royalvice reopened this Jul 11, 2023
@rahulhanotDTU
Copy link
Author

I don't think so this is some kind of compression that was used by the WhatsApp to send the image can you please help urgently

@Royalvice
Copy link
Owner

I don't think so this is some kind of compression that was used by the WhatsApp to send the image can you please help urgently

Sure, I will do my best to assist you as soon as possible.

@rahulhanotDTU
Copy link
Author

rahulhanotDTU commented Jul 11, 2023

Any update regarding it

@22chenR
Copy link

22chenR commented Jan 7, 2024

请问你上传的inference notebook中怎么没有去噪和去水印的过程?当我把TEST_PATH_IMG 的路径改成带有水印的图像时,得到的结果是这样的。我想知道应该怎么办?
Uploading PMC1635421_00004.jpg.png…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants