-
Notifications
You must be signed in to change notification settings - Fork 42
CTC training speed question #220
Comments
Thanks for doing the comparison, and sure, that's a good idea. Yes, we should introduce a special-purpose function that constructs a batch of CTC graphs from a ragged tensor consisting of the linear symbol sequences for each one. Perhaps @pkufool could work on that? |
Shall we also consider the transition probability contained in the bigram P while constructing the graph for LF-MMI training? |
Perhaps for LF-MMI it would be best to use our current code so it takes
care of that. But last time I checked, graph compilation does not actually
take much time since we batch things up.
Max, perhaps you could show us what code you are using for graph
compilation, e.g. are you compiling these things individually or as a batch?
…On Mon, Jun 28, 2021 at 5:20 PM Fangjun Kuang ***@***.***> wrote:
Shall we also consider the transition probability contained in the bigram
P while constructing the graph for LF-MMI training?
(It's not an issue for CTC training.)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#220 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLOZR3TKA7SHPGRI5HKTTVA5HHANCNFSM47NBAV5Q>
.
|
Hi, Dan, my comparison is based on the code: snowfall/snowfall/training/ctc_graph.py Lines 108 to 127 in 2dda31e
|
That code is doing composition on CPU. Could you try snowfall/snowfall/training/mmi_graph.py Lines 150 to 181 in 2dda31e
which is run on GPU. |
Thanks. I could try this way to compose. Actually, for my code, I follow this requirement When treat_epsilons_specially is True, this function works only on CPU. When treat_epsilons_specially is False and both a_fsa and b_fsa are on GPU, then this function works on GPU |
Sure, I will. |
I am afraid that has to be done in C++.
Yes, it is run on GPU. It would be more efficient if you
|
@danpovey Do you mean constructing the decoding_graphs for |
Yes, I'm talking about constructing it for a batch at a time; in general all our FSA functions work for a batch (of course people can use a batch of one if needed). This function will be very fast so there is no problem re-doing the work on each minibatch. |
Hi, for my experiment, built-in (cudnnctc) is about 2.5 times fast than k2-ctc. I was wondering if this is normal and would like to make sure my program is correct.
I found that
decoding_graph = k2.compose(self.ctc_topo, label_graph, treat_epsilons_specially=False)
is the reason even withbuild_ctc_topo2
#209. How about considering to construct ctc loss directly from text rather than compsing the topo FST and label FST. In my experiment, it would give a very similar speed with cudnnctc.Like the below pic:
The text was updated successfully, but these errors were encountered: