How to work with images in different sizes... #4

utopic-dev · 2024-11-14T01:33:18Z

Guys, first of all I want to thank you for the incredible work, very elegant and admirable, I have a specific case here where the images I have for training are 150x40px and when increasing them to 512x, they are becoming very distorted and the model seems to not be able to interpret much, other trainings with larger images were a success, I would humbly like to know if it is possible to adjust the scripts to be able to train with smaller images, when changing options.py parser.add_argument img-size and patch size and also in HTR_VT, it triggers a series of errors that so far I have not been able to solve.

Can anyone help me with how to work with smaller images, or is the standard size of 512px mandatory? I thank you in advance for your attention and help.

YutingLi0606 · 2024-11-19T13:46:06Z

Hi, thank you for your interest! I’m more than happy to answer your question. 😊

To begin with, HTR-VT utilizes a slightly modified ResNet-18 instead of the standard Patch Embedding in ViT. The downsampling size depends on this ResNet configuration. If you wish to use a different image size, you will need to adjust the ResNet accordingly.

In options.py, make sure to align the patch size with the new downsampling size.

For example, we use an input size of 512x64 with a downsampling size of 4x64 in ResNet (corresponding to the patch size in options.py). This results in a 128x1 feature input to the transformer encoder. Based on our experiments, maintaining this feature shape as nx1 is optimal for this task.

If you plan to use a size like 150x40, here are two suggestions:

Consider resizing the input to 128x32 or using a higher resolution like 256x64.
Modify the ResNet (adjust the stride, add layers, or remove layers) to ensure the downsampling size remains appropriate.
For instance, with an input size of 128x32, after adjustments, you can obtain a feature size of 32x1. In this case, the patch size would be 4x32. Modify the ResNet configuration as follows:

self.layer2 = self._make_layer(BasicBlock, nb_feat // 2, 2, stride=2)
Change to:
self.layer2 = self._make_layer(BasicBlock, nb_feat // 2, 2, stride=(2, 1))

Note: The hyperparameters provided above are examples. You should fine-tune them based on your experimental results.

I hope this helps!

Best regards,
Yuting

utopic-dev · 2024-12-04T11:24:57Z

Hello Yuting, First of all, sorry me for the delay in responding to you. Thank you very much for your incredible explanation and help. This only reinforces how incredible you and your work are. Once again, I thank you for your attention. I will look for you and add you on social media. I want to show you some projects. Incredible work. Congratulations.

You help a lot!

Thanks again!

Best regards! 🙏

utopic-dev · 2024-12-04T11:37:21Z

Is there somewhere I can connect with you? I really appreciate it

YutingLi0606 · 2024-12-04T11:53:19Z

Hi～ My email is [email protected]

XiSHEN0220 closed this as completed Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to work with images in different sizes... #4

How to work with images in different sizes... #4

utopic-dev commented Nov 14, 2024

YutingLi0606 commented Nov 19, 2024

utopic-dev commented Dec 4, 2024 •

edited

Loading

utopic-dev commented Dec 4, 2024

YutingLi0606 commented Dec 4, 2024

How to work with images in different sizes... #4

How to work with images in different sizes... #4

Comments

utopic-dev commented Nov 14, 2024

YutingLi0606 commented Nov 19, 2024

utopic-dev commented Dec 4, 2024 • edited Loading

utopic-dev commented Dec 4, 2024

YutingLi0606 commented Dec 4, 2024

utopic-dev commented Dec 4, 2024 •

edited

Loading