You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder what is the theoretical basis for starting decoding from 3rd position. I'm referring to this line: ctc_decode = bknd.ctc_decode(y_pred[:, 2:, :], input_length=np.ones(shape[0])*shape[1])[0][0]
In image_ocr.py example on keras github there's a comment:
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
But why? And why everyone is using 2 regardless of dataset, image width and text length?
The text was updated successfully, but these errors were encountered:
If I start decoding with zero I indeed receive "garbage" sometimes (usually a duplicate of first character), but if the same slicing is is in the cost function then it's not suprising
I wonder what is the theoretical basis for starting decoding from 3rd position. I'm referring to this line: ctc_decode = bknd.ctc_decode(y_pred[:, 2:, :], input_length=np.ones(shape[0])*shape[1])[0][0]
In image_ocr.py example on keras github there's a comment:
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
But why? And why everyone is using 2 regardless of dataset, image width and text length?
To be honest, I don't know the specific reason for this, either.
Hi,
I wonder what is the theoretical basis for starting decoding from 3rd position. I'm referring to this line:
ctc_decode = bknd.ctc_decode(y_pred[:, 2:, :], input_length=np.ones(shape[0])*shape[1])[0][0]
In image_ocr.py example on keras github there's a comment:
But why? And why everyone is using 2 regardless of dataset, image width and text length?
The text was updated successfully, but these errors were encountered: