-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about the input shape #3
Comments
The input size is set by you before starting the training, and it's fixed. Once you train a model in one input shape, than rest inputs should be in the same size, including training dataset and test dataset. My method is, set a aspect ratio like width:height = 5:1, and only a few inputs are bigger than this ratio, I resize them to 5:1. The neural network will learn features from these resized images, and if a image is so long, it will contains some features that is unique and good for recognize. My statement maybe nor clear, if you still get any question, please tell me. My English is not very good, but I'd love to help you. |
@sbillburg thanks a lot! I have got your point. |
看了一下才发现您是国人,那我就直接再用中文给你说一遍了。 所以对我来说,我的思路就是尽量少的去resize。比如我设定一个宽高比5:1, 然后在数据集里生成训练batch的时候,把所有宽高比高于5:1的图片(说明图片很宽,横向很长)直接压缩为5:1,虽然会有图像上的损失或者说失真,但是如果宽高比很高,就说明单词很长,特征很明显,对于网络来说也不难识别了。 对于长宽比小于5;1的图片,说明其宽度较窄,我会在其两遍加上纯黑色的色块,生成一个5:1的图像,原始的图像长宽比并没有改变,而是靠额外的拼接使得图像达到了需要的比例。纯黑色的色块对于网络来说也会学习为‘什么都不输出’,所以不必担心识别错误的问题。 相关的实现方法在CRNN-with-STN/Batch_Generator.py, line38~line44 可以看到,如果您还有不明白的地方可以直接问我或发邮件。 |
@sbillburg 哈哈哈,谢谢你了。我觉得你加了stn效果并不比没加stn效果好的原因是stn加在了后面,如果字符行本身旋转角度不大,那么其实形变比较小,后面的特征图,特别是经过了maxpooling的特征图,的特征都是经过了提炼的,你再去stn仿射变换可能效果不如直接在输入的时候做stn效果来的妥当。 |
CRNN-with-STN/Batch_Generator.py里面的38行 |
Can you tell me the difference? It seems the same in Python3 with or without the parentheses |
Python3没有问题,Python2的时候会有区别,习惯上加个括号比较好 |
想问下,STN加在batchnorm_7这个位置,有什么论文或者理论依据吗?? |
没有,STN整个部分相当于一个模块,我只是加在了CNN和RNN之间,你可以把这一模块放在网络的任意位置,说不定可以取得更好的效果。本项目只是对于CRNN的Keras实现,以及STN的一些尝试。 |
感觉是张量格式不对,还是要尽量对照源代码中的输入和输出的格式来。注意源代码中的loc_net函数调用的方法和参数
… 2019年11月28日 下午3:33,jingwanli6666 ***@***.***> 写道:
loc_net
|
I found your model has the certain size of input, so, how can your recognize images with uncertain size? Like a 64*500 image, if resize the image, it main destroy its aspect ratio and influence the result, is it?
The text was updated successfully, but these errors were encountered: