Use CTC beam search decoder with subword encoding. #3750

DomainFlag · 2022-10-12T23:31:53Z

I'm using the scorer generator provided generate_scorer_package. I'm also using (e.g., SentencePiece) to build a unigram language model, where the decoder predicts the size of the language model. How can I adapt the scorer such that it supports sub-word units? Will scorer work if filling the alphabet file with the sub-word units? Or shall I rely on some tricks like encoding the unigram language model using an ASCII table and re-encoding the corpus and use the alphabet based on the previous encoding mapping? Thank you.

The text was updated successfully, but these errors were encountered:

huks0 · 2024-07-10T09:29:03Z

Have you ever solved this? Is there a way to use subword encoding?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CTC beam search decoder with subword encoding. #3750

Use CTC beam search decoder with subword encoding. #3750

DomainFlag commented Oct 12, 2022 •

edited

Loading

huks0 commented Jul 10, 2024

Use CTC beam search decoder with subword encoding. #3750

Use CTC beam search decoder with subword encoding. #3750

Comments

DomainFlag commented Oct 12, 2022 • edited Loading

huks0 commented Jul 10, 2024

DomainFlag commented Oct 12, 2022 •

edited

Loading