why decoding starts from 3rd position? #4

soldierofhell · 2018-12-26T21:57:30Z

Hi,

I wonder what is the theoretical basis for starting decoding from 3rd position. I'm referring to this line:
ctc_decode = bknd.ctc_decode(y_pred[:, 2:, :], input_length=np.ones(shape[0])*shape[1])[0][0]

In image_ocr.py example on keras github there's a comment:

# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:

But why? And why everyone is using 2 regardless of dataset, image width and text length?

The text was updated successfully, but these errors were encountered:

soldierofhell · 2018-12-26T21:59:39Z

Just noticed the same comment in your code :)

soldierofhell · 2018-12-26T22:04:23Z

If I start decoding with zero I indeed receive "garbage" sometimes (usually a duplicate of first character), but if the same slicing is is in the cost function then it's not suprising

sbillburg · 2018-12-27T06:36:30Z

Hi,

I wonder what is the theoretical basis for starting decoding from 3rd position. I'm referring to this line:
ctc_decode = bknd.ctc_decode(y_pred[:, 2:, :], input_length=np.ones(shape[0])*shape[1])[0][0]

In image_ocr.py example on keras github there's a comment:
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
But why? And why everyone is using 2 regardless of dataset, image width and text length?

To be honest, I don't know the specific reason for this, either.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why decoding starts from 3rd position? #4

why decoding starts from 3rd position? #4

soldierofhell commented Dec 26, 2018

soldierofhell commented Dec 26, 2018

soldierofhell commented Dec 26, 2018

sbillburg commented Dec 27, 2018

why decoding starts from 3rd position? #4

why decoding starts from 3rd position? #4

Comments

soldierofhell commented Dec 26, 2018

soldierofhell commented Dec 26, 2018

soldierofhell commented Dec 26, 2018

sbillburg commented Dec 27, 2018