-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k2::TopSorter::TopSort assertion, but only when using GPU #1204
Comments
Are you using the latest icefall and have you made any changes to the code? |
Icefall: Latest. k2-fsa/icefall@1aeffa7 Code changes: Minimal to make it download and run on my limited hardware. Diff: rouseabout/icefall@9e23b38
|
Can you make sure you ran any tests that are available in k2? Sorry I don't recall the details of how. |
Guys especially @pkufool I noticed an issue in top_sort.cu.
but the actual code does not do this, it actually just gets the states of in-degree 0, as in the original
Notice that if the start-state has in-degree >0 (this is after removing self-loops), the start-state |
OK,I will have a look. |
Also, @rouseabout, can you try running it in pdb and getting a python stack trace when it fails? It would be nice to know for sure exactly when TopSort is being called. |
I suspect the top_sort is in https://github.com/k2-fsa/icefall/blob/7b0afbdc16066701759e088f7edbb648a0b879f0/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L213 I paste the code here (the top_sort is in the last 5th line), can you dump the problemic lattice, you can do it with lattice = fast_beam_search(
model=model,
decoding_graph=decoding_graph,
encoder_out=encoder_out,
encoder_out_lens=encoder_out_lens,
beam=beam,
max_states=max_states,
max_contexts=max_contexts,
temperature=temperature,
)
nbest = Nbest.from_lattice(
lattice=lattice,
num_paths=num_paths,
use_double_scores=use_double_scores,
nbest_scale=nbest_scale,
)
# The following code is modified from nbest.intersect()
word_fsa = k2.invert(nbest.fsa)
if hasattr(lattice, "aux_labels"):
# delete token IDs as it is not needed
del word_fsa.aux_labels
word_fsa.scores.zero_()
word_fsa_with_epsilon_loops = k2.linear_fsa_with_self_loops(word_fsa)
path_to_utt_map = nbest.shape.row_ids(1)
if hasattr(lattice, "aux_labels"):
# lattice has token IDs as labels and word IDs as aux_labels.
# inv_lattice has word IDs as labels and token IDs as aux_labels
inv_lattice = k2.invert(lattice)
inv_lattice = k2.arc_sort(inv_lattice)
else:
inv_lattice = k2.arc_sort(lattice)
if inv_lattice.shape[0] == 1:
path_lattice = k2.intersect_device(
inv_lattice,
word_fsa_with_epsilon_loops,
b_to_a_map=torch.zeros_like(path_to_utt_map),
sorted_match_a=True,
)
else:
path_lattice = k2.intersect_device(
inv_lattice,
word_fsa_with_epsilon_loops,
b_to_a_map=path_to_utt_map,
sorted_match_a=True,
)
# path_lattice has word IDs as labels and token IDs as aux_labels
path_lattice = k2.top_sort(k2.connect(path_lattice))
tot_scores = path_lattice.get_tot_scores(
use_double_scores=use_double_scores,
log_semiring=True, # Note: we always use True
) |
Thanks for looking into this. Quick note setup.py disables building the C++ tests. I suggest changing this.
After rebuilding, I can see 2 C++ tests are failing. All python tests are passing.
Hardware: NVIDIA Corporation GP104GL [Tesla P4] (rev a1) Stack trace from
and python error message:
|
Thanks! Can you rerun the tests with the --rerun-failed --output-on-failure options as it mentions? It might be CTest, not sure which directory it would have been in. |
https://pross.sdf.org/sandpit/log.txt (467 KiB)
https://pross.sdf.org/sandpit/path_lattice.pt (355 MiB) I will delete these files in a few days. Cheers. |
Thanks! I am debuging it, will post the results here once available. |
@rouseabout Could you also dump the lattice from lattice = fast_beam_search(
model=model,
decoding_graph=decoding_graph,
encoder_out=encoder_out,
encoder_out_lens=encoder_out_lens,
beam=beam,
max_states=max_states,
max_contexts=max_contexts,
temperature=temperature,
) |
https://pross.sdf.org/sandpit/lattice.pt (6.7M) Observation: The contents of path_lattice.pt changes each time I run decode.py (md5sum changes), whereas lattice.pt content is always the same. I expected these to be deterministic. |
Thank you!
edit: Sorry, I am wrong, the paths are not randomly sampled, see https://k2-fsa.github.io/k2/python_api/api.html#random-paths. So this might be another issue. |
@rouseabout Sorry for the slow reply, I can not reproduce the error with the lattices you provided. I also tried create from icefall.decode import Nbest
lattice = k2.Fsa.from_dict(torch.load("/star-kw/kangwei/issues/k2_1204/lattice.pt"))
lattice = lattice.to("cuda:4")
nbest = Nbest.from_lattice(
lattice=lattice,
num_paths=200,
use_double_scores=True,
nbest_scale=0.5,
)
# The following code is modified from nbest.intersect()
word_fsa = k2.invert(nbest.fsa)
if hasattr(lattice, "aux_labels"):
# delete token IDs as it is not needed
del word_fsa.aux_labels
word_fsa.scores.zero_()
word_fsa_with_epsilon_loops = k2.linear_fsa_with_self_loops(word_fsa)
path_to_utt_map = nbest.shape.row_ids(1)
if hasattr(lattice, "aux_labels"):
# lattice has token IDs as labels and word IDs as aux_labels.
# inv_lattice has word IDs as labels and token IDs as aux_labels
inv_lattice = k2.invert(lattice)
inv_lattice = k2.arc_sort(inv_lattice)
else:
inv_lattice = k2.arc_sort(lattice)
if inv_lattice.shape[0] == 1:
path_lattice = k2.intersect_device(
inv_lattice,
word_fsa_with_epsilon_loops,
b_to_a_map=torch.zeros_like(path_to_utt_map),
sorted_match_a=True,
)
else:
path_lattice = k2.intersect_device(
inv_lattice,
word_fsa_with_epsilon_loops,
b_to_a_map=path_to_utt_map,
sorted_match_a=True,
) Could you check that the lattice you have dumpped is the problemic one, thank you very much! |
@pkufool Really appreciate you looking intro this. It is not urgent. I can confirm the lattice.pt and path_lattice.pt were output from When I run you notebook lines, I observe the same shape and properties_str output. When I run your code, changing cuda:4 to cuda:0, it runs normally, no crash... HOWEVER, your code is missing the line from path_lattice = k2.top_sort(k2.connect(path_lattice)) After adding this line to your code, it crashes at What GPU are you testing on? |
See cell 14.
I tested it on a nvidia V100. (pytorch version 1.8.1, cuda version 10.2). |
Before invokeing |
Opps, I missed cell 14 :(
I am using this docker image (https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-23-04.html). I will try an older image.
|
Results: 8GB Tesla P4:
16GB Tesla T4:
Software/hardware configurations were otherwise identical. While its only a few data points, one might conclude k2 + CUDA 12.x has problems. |
8GB Tesla P4:
|
Thanks! we will debug it on cuda 12.x |
Using icefall/egs/librispeech/ASR/pruned_transducer_stateless7 recipe, using only train-clean-5 and dev-clean-2 to train a model, and running pruned_transducer_stateless7/decode.py on GPU with --decoding-method fast_beam_search_nbest_LG produces the following error.
However when pruned_transducer_stateless7/decode.py is forced to use the CPU, fast_beam_search_nbest_LG runs successfully.
Any suggestions what I might be doing wrong?
Image:
nvcr.io/nvidia/pytorch:23.04-py3
k2.version:
The text was updated successfully, but these errors were encountered: