-
Notifications
You must be signed in to change notification settings - Fork 42
Improve training speed by pre-computing compose(ctc_topo, P, L)
#172
Conversation
I wonder if it makes sense to retain a copy of the un-optimized version of the LFMMI loss, maybe sth like “SimpleLFMMI”, as a reference for people who just want to understand how it works. |
snowfall/training/mmi_graph.py
Outdated
|
||
# TODO(fangjun): k2.connect supports only CPU. | ||
# Add CUDA support. | ||
num_graphs = k2.connect(num_graphs.to('cpu')).to(P.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably need a CUDA version of k2.connect(),
though I have not profiled this pull-request. It is currently
slower than before. Not sure if it is the problem of TaskRedirect or is caused by this statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is .connect() really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is .connect() really necessary?
If I don't invoke k2.connect
, then the resulting get_tot_scores
for the num_lats returns all -inf
.
If k2.connect
is used, then get_tot_scores
returns no -inf
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is not right here.
I think it may be a mistake to compose ctc_topo unless it's right at the end. I believe ctc_topo expects to be composed with something that was epsilon-free and which then had epsilon self-loops added. Because we are interpreting the epsilons on one side as "blank", which is in a sense a real symbol, things are a little subtle there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... so I think it may be OK to compose L and P, and to compose that with the transcripts, but I'd leave ctc_topo until the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be a mistake to compose ctc_topo unless it's right at the end. I believe ctc_topo expects to be composed with something that was epsilon-free and which then had epsilon self-loops added.
I am using intersect_device(ctc_topo_inv, P_with_self_loops).invert()
(equivalent to compose(ctc_topo, P, treat_epslion_speciall=True)
.
There is no 0
(neither blank nor epsilons) in P, so I think it is correct.
can you find out how the num-states changes?
It could be a state-sorting issue, although we should be detecting that
from the property flags.
…On Thu, Apr 22, 2021 at 5:17 PM Fangjun Kuang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In snowfall/training/mmi_graph.py
<#172 (comment)>:
> + linear_fsas = self.build_linear_fsas(texts)
+ linear_fsas_with_self_loops = k2.add_epsilon_self_loops(linear_fsas)
+
+ b_to_a_map = torch.zeros(len(texts),
+ dtype=torch.int32,
+ device=self.device)
+
+ num_graphs = k2.intersect_device(self.HPL_inv_sorted,
+ linear_fsas_with_self_loops,
+ b_to_a_map,
+ sorted_match_a=True)
+ num_graphs = k2.invert(num_graphs)
+
+ # TODO(fangjun): k2.connect supports only CPU.
+ # Add CUDA support.
+ num_graphs = k2.connect(num_graphs.to('cpu')).to(P.device)
is .connect() really necessary?
If I don't invoke k2.connect, then the resulting get_tot_scores for the
num_lats returns all -inf.
If k2.connect is used, then get_tot_scores returns no -infs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#172 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2WAZI3FT2J3LINR3TTJ7ST3ANCNFSM43LRQVJQ>
.
|
.. also, while non-connected input could cause search errors, I wouldn't
expect the result to be *all* infinity, unless there was a problem like
it was not state-sorted.
…On Thu, Apr 22, 2021 at 5:28 PM Daniel Povey ***@***.***> wrote:
can you find out how the num-states changes?
It could be a state-sorting issue, although we should be detecting that
from the property flags.
On Thu, Apr 22, 2021 at 5:17 PM Fangjun Kuang ***@***.***>
wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In snowfall/training/mmi_graph.py
> <#172 (comment)>:
>
> > + linear_fsas = self.build_linear_fsas(texts)
> + linear_fsas_with_self_loops = k2.add_epsilon_self_loops(linear_fsas)
> +
> + b_to_a_map = torch.zeros(len(texts),
> + dtype=torch.int32,
> + device=self.device)
> +
> + num_graphs = k2.intersect_device(self.HPL_inv_sorted,
> + linear_fsas_with_self_loops,
> + b_to_a_map,
> + sorted_match_a=True)
> + num_graphs = k2.invert(num_graphs)
> +
> + # TODO(fangjun): k2.connect supports only CPU.
> + # Add CUDA support.
> + num_graphs = k2.connect(num_graphs.to('cpu')).to(P.device)
>
> is .connect() really necessary?
>
> If I don't invoke k2.connect, then the resulting get_tot_scores for the
> num_lats returns all -inf.
> If k2.connect is used, then get_tot_scores returns no -infs.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#172 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAZFLO2WAZI3FT2J3LINR3TTJ7ST3ANCNFSM43LRQVJQ>
> .
>
|
Thanks, will check that. |
I confirm that the tot_scores of Some information about the FsaVec before and after calling Beforenum_fsas: 8 afternum_fsas: 8 (NOTE: k2.arc_sort is called later for both cases) The |
Here are the profiling results of this pull-request. this pull-requestmaster branchEven though this pull-request requires one less call to k2.intersect_device, it is actually slower. I compared the size of the resulting num_graphs, listed in below. You can see that the resulting num_graphs of this pull-request is actually larger, so it takes more time. this pull-request
master branch
Closing. |
Seems to be working, but it needs more tests.
Will continue with it after fixing #169
Relates to #165 and depends on k2-fsa/k2#726