Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

Topology choices #195

Open
danpovey opened this issue May 13, 2021 · 0 comments
Open

Topology choices #195

danpovey opened this issue May 13, 2021 · 0 comments

Comments

@danpovey
Copy link
Contributor

danpovey commented May 13, 2021

It would be nice to have a choice about the topology to use, that could be passed in somehow, e.g. as a string when we do training. E.g. to have a wrapper

build_topo(tokens: List[int], topo_type: string = 'ctc', num_states: int = 1)

where you can specify, for instance, 'left_to_right' for the traditional left-to-right HMM topology without a blank,
with specifiable num_states (we expect this will normally be 1).
Caution: this tokens should not contain 0, and I believe we should probably make build_ctc_topo add the 0 itself
internally, which IMO would be a better interface.

build_left_to_right_topo(tokens: List[int], num_states: int = 1) -> Fsa

This left-to-right topology will be useful for training alignment models, for instance.

@pzelasko something else that will be useful for word alignments, is if we can add an auxiliary label word_start to the lexicon FST. This would be a label on the 1st arc of the 1st phone of each word, indicating the word-id. For many purposes, e.g. building a traditional decoding graph, we can remove this before using it; but this will be useful for getting word alignments. We'd have to write a function that would process a 1-best lattice path into word alignments, by first segmenting using the word-id and then stripping out any optional silence and/or blank (if relevant) from the end of the word. Of course this will be more accurate when using a xent or MMI model than ctc.

I also want to have example scripts for training a xent model where we subtract the prior from the nnet output; even if this is not better than regular CTC WER-wise, it will be useful for alignment purposes. We can initialize the phone prior to all-equal, and update it using a forgetting factor from the nnet output.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant