Topology choices #195

danpovey · 2021-05-13T06:35:03Z

It would be nice to have a choice about the topology to use, that could be passed in somehow, e.g. as a string when we do training. E.g. to have a wrapper

build_topo(tokens: List[int], topo_type: string = 'ctc', num_states: int = 1)

where you can specify, for instance, 'left_to_right' for the traditional left-to-right HMM topology without a blank,
with specifiable num_states (we expect this will normally be 1).
Caution: this tokens should not contain 0, and I believe we should probably make build_ctc_topo add the 0 itself
internally, which IMO would be a better interface.

build_left_to_right_topo(tokens: List[int], num_states: int = 1) -> Fsa

This left-to-right topology will be useful for training alignment models, for instance.

@pzelasko something else that will be useful for word alignments, is if we can add an auxiliary label word_start to the lexicon FST. This would be a label on the 1st arc of the 1st phone of each word, indicating the word-id. For many purposes, e.g. building a traditional decoding graph, we can remove this before using it; but this will be useful for getting word alignments. We'd have to write a function that would process a 1-best lattice path into word alignments, by first segmenting using the word-id and then stripping out any optional silence and/or blank (if relevant) from the end of the word. Of course this will be more accurate when using a xent or MMI model than ctc.

I also want to have example scripts for training a xent model where we subtract the prior from the nnet output; even if this is not better than regular CTC WER-wise, it will be useful for alignment purposes. We can initialize the phone prior to all-equal, and update it using a forgetting factor from the nnet output.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topology choices #195

Topology choices #195

danpovey commented May 13, 2021 •

edited

Loading

Topology choices #195

Topology choices #195

Comments

danpovey commented May 13, 2021 • edited Loading

danpovey commented May 13, 2021 •

edited

Loading