You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.
It would be nice to have a choice about the topology to use, that could be passed in somehow, e.g. as a string when we do training. E.g. to have a wrapper
build_topo(tokens: List[int], topo_type: string = 'ctc', num_states: int = 1)
where you can specify, for instance, 'left_to_right' for the traditional left-to-right HMM topology without a blank,
with specifiable num_states (we expect this will normally be 1).
Caution: this tokens should not contain 0, and I believe we should probably make build_ctc_topo add the 0 itself
internally, which IMO would be a better interface.
build_left_to_right_topo(tokens: List[int], num_states: int = 1) -> Fsa
This left-to-right topology will be useful for training alignment models, for instance.
@pzelasko something else that will be useful for word alignments, is if we can add an auxiliary label word_start to the lexicon FST. This would be a label on the 1st arc of the 1st phone of each word, indicating the word-id. For many purposes, e.g. building a traditional decoding graph, we can remove this before using it; but this will be useful for getting word alignments. We'd have to write a function that would process a 1-best lattice path into word alignments, by first segmenting using the word-id and then stripping out any optional silence and/or blank (if relevant) from the end of the word. Of course this will be more accurate when using a xent or MMI model than ctc.
I also want to have example scripts for training a xent model where we subtract the prior from the nnet output; even if this is not better than regular CTC WER-wise, it will be useful for alignment purposes. We can initialize the phone prior to all-equal, and update it using a forgetting factor from the nnet output.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
It would be nice to have a choice about the topology to use, that could be passed in somehow, e.g. as a string when we do training. E.g. to have a wrapper
where you can specify, for instance, 'left_to_right' for the traditional left-to-right HMM topology without a blank,
with specifiable num_states (we expect this will normally be 1).
Caution: this
tokens
should not contain 0, and I believe we should probably make build_ctc_topo add the 0 itselfinternally, which IMO would be a better interface.
This left-to-right topology will be useful for training alignment models, for instance.
@pzelasko something else that will be useful for word alignments, is if we can add an auxiliary label
word_start
to the lexicon FST. This would be a label on the 1st arc of the 1st phone of each word, indicating the word-id. For many purposes, e.g. building a traditional decoding graph, we can remove this before using it; but this will be useful for getting word alignments. We'd have to write a function that would process a 1-best lattice path into word alignments, by first segmenting using the word-id and then stripping out any optional silence and/or blank (if relevant) from the end of the word. Of course this will be more accurate when using a xent or MMI model than ctc.I also want to have example scripts for training a xent model where we subtract the prior from the nnet output; even if this is not better than regular CTC WER-wise, it will be useful for alignment purposes. We can initialize the phone prior to all-equal, and update it using a forgetting factor from the nnet output.
The text was updated successfully, but these errors were encountered: