Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

Improve training speed by pre-computing compose(ctc_topo, P, L) #172

Closed
wants to merge 4 commits into from

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Apr 22, 2021

Seems to be working, but it needs more tests.

Will continue with it after fixing #169


Relates to #165 and depends on k2-fsa/k2#726

@pzelasko
Copy link
Collaborator

I wonder if it makes sense to retain a copy of the un-optimized version of the LFMMI loss, maybe sth like “SimpleLFMMI”, as a reference for people who just want to understand how it works.


# TODO(fangjun): k2.connect supports only CPU.
# Add CUDA support.
num_graphs = k2.connect(num_graphs.to('cpu')).to(P.device)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danpovey

I think we probably need a CUDA version of k2.connect(),
though I have not profiled this pull-request. It is currently
slower than before. Not sure if it is the problem of TaskRedirect or is caused by this statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is .connect() really necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is .connect() really necessary?

If I don't invoke k2.connect, then the resulting get_tot_scores for the num_lats returns all -inf.
If k2.connect is used, then get_tot_scores returns no -infs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is not right here.
I think it may be a mistake to compose ctc_topo unless it's right at the end. I believe ctc_topo expects to be composed with something that was epsilon-free and which then had epsilon self-loops added. Because we are interpreting the epsilons on one side as "blank", which is in a sense a real symbol, things are a little subtle there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... so I think it may be OK to compose L and P, and to compose that with the transcripts, but I'd leave ctc_topo until the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be a mistake to compose ctc_topo unless it's right at the end. I believe ctc_topo expects to be composed with something that was epsilon-free and which then had epsilon self-loops added.

I am using intersect_device(ctc_topo_inv, P_with_self_loops).invert() (equivalent to compose(ctc_topo, P, treat_epslion_speciall=True).

There is no 0 (neither blank nor epsilons) in P, so I think it is correct.

@danpovey
Copy link
Contributor

danpovey commented Apr 22, 2021 via email

@danpovey
Copy link
Contributor

danpovey commented Apr 22, 2021 via email

@csukuangfj
Copy link
Collaborator Author

.. also, while non-connected input could cause search errors, I wouldn't
expect the result to be all infinity, unless there was a problem like
it was not state-sorted.

Thanks, will check that.

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Apr 23, 2021

@danpovey

. also, while non-connected input could cause search errors, I wouldn't
expect the result to be all infinity, unless there was a problem like
it was not state-sorted.

I confirm that the tot_scores of num_lats are all -inf even if it is top sorted.
(The number of Fsas in the FsaVec is 8).


Some information about the FsaVec before and after calling k2.connect:

Before

num_fsas: 8
num_states: 3309424
num_arcs: 3315379
properties: "Valid|Nonempty|MaybeAccessible"

after

num_fsas: 8
num_states: 3228
num_arcs: 9183
properties: "Valid|Nonempty|TopSorted|MaybeAccessible|MaybeCoaccessible"

(NOTE: k2.arc_sort is called later for both cases)


The GetCounts() issue is fixed in k2-fsa/k2@d43f77e (from k2-fsa/k2#726)
and it is this num_graphs that results in errors in GetCounts(). After fixing it, k2.top_sort is applied to
this num_graphs to make it top sorted (but not connected).

@csukuangfj
Copy link
Collaborator Author

Here are the profiling results of this pull-request.

this pull-request

Screen Shot 2021-04-25 at 7 01 11 PM

master branch

Screen Shot 2021-04-25 at 7 01 18 PM


Even though this pull-request requires one less call to k2.intersect_device, it is actually slower. I compared the size of the resulting num_graphs, listed in below. You can see that the resulting num_graphs of this pull-request is actually larger, so it takes more time.

this pull-request

  • num_fsas: 42
  • num_states: 19626
  • num_arcs: 51152

master branch

  • num_fsas: 42
  • num_states: 8051
  • num_arcs: 11051

Closing.

@csukuangfj csukuangfj closed this Apr 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants