-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to work with images in different sizes... #4
Comments
Hi, thank you for your interest! I’m more than happy to answer your question. 😊 To begin with, HTR-VT utilizes a slightly modified ResNet-18 instead of the standard Patch Embedding in ViT. The downsampling size depends on this ResNet configuration. If you wish to use a different image size, you will need to adjust the ResNet accordingly. In options.py, make sure to align the patch size with the new downsampling size. For example, we use an input size of 512x64 with a downsampling size of 4x64 in ResNet (corresponding to the patch size in options.py). This results in a 128x1 feature input to the transformer encoder. Based on our experiments, maintaining this feature shape as nx1 is optimal for this task. If you plan to use a size like 150x40, here are two suggestions:
self.layer2 = self._make_layer(BasicBlock, nb_feat // 2, 2, stride=2) Note: The hyperparameters provided above are examples. You should fine-tune them based on your experimental results. I hope this helps! Best regards, |
Hello Yuting, First of all, sorry me for the delay in responding to you. Thank you very much for your incredible explanation and help. This only reinforces how incredible you and your work are. Once again, I thank you for your attention. I will look for you and add you on social media. I want to show you some projects. Incredible work. Congratulations. You help a lot! Thanks again! Best regards! 🙏 |
Is there somewhere I can connect with you? I really appreciate it |
Hi~ My email is [email protected] |
Guys, first of all I want to thank you for the incredible work, very elegant and admirable, I have a specific case here where the images I have for training are 150x40px and when increasing them to 512x, they are becoming very distorted and the model seems to not be able to interpret much, other trainings with larger images were a success, I would humbly like to know if it is possible to adjust the scripts to be able to train with smaller images, when changing options.py parser.add_argument img-size and patch size and also in HTR_VT, it triggers a series of errors that so far I have not been able to solve.
Can anyone help me with how to work with smaller images, or is the standard size of 512px mandatory? I thank you in advance for your attention and help.
The text was updated successfully, but these errors were encountered: