-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCR: clarification about input and output #20
Comments
Some info described here, but it still not very clear for me:
|
Hi @mrgloom , you can use either width or height as your ''time dimension''. Using the width you will perform a row-wise scan, otherwise you will perform a column-wise scan. Also, you can apply conv layers before the LSTM network followed by a Global Average Pooling, returning a tensor with shape |
I'm trying to solve OCR tasks based on this code.
So what shape input to LSTM should have, suppose we have images
[batch_size, height, width, channels]
how should they be reshaped to be used as input? Like[batch_size, width, height*channels]
, sowidth
is liketime dimension
?What if I want to have variable width? As I understand size of sequences in batch should be the same (common trick just to use padding by zeros at the end of sequence?) or
batch_size
should be 1)What if I want to have variable width and height? As I understand I need to use convolutional + global average pooling / spartial pyramid pooling layers before input to LSTM, so output blob will be
[batch_size, feature_map_height, feature_map_width, feature_map_channels]
, how should blob be reshaped to be used as input to LSTM? Like[batch_size, feature_map_width, feature_map_height*feature_map_channels]
? Can we reshape it just to single row like[batch_size, feature_map_width*feature_map_height*feature_map_channels]
it will be like sequence of pixels and we loose some spartial information, will it work?Here is definition of input, but I'm not sure what it's mean in your case
[batch_size, max_stepsize, num_features]
:https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L90
And how output of LSTM depends on input size and max sequence length?
https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L110
BTW: Here is some examples using 'standard' approaches in Keras+Tensorflow which I want to complement with RNN examples.
https://github.com/mrgloom/Char-sequence-recognition
The text was updated successfully, but these errors were encountered: