Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

y = k2.RaggedTensor(y).to(device) RuntimeError: Some bad things happened. Please read the above error messages and stack #1214

Open
TszSimLaw opened this issue Jun 21, 2023 · 11 comments

Comments

@TszSimLaw
Copy link

./pruned_transducer_stateless7_bbpe/train.py

Exception:

-- Process 3 terminated with the following error:

y = k2.RaggedTensor(y).to(device) RuntimeError: Some bad things happened. Please read the above error messages and stack

@csukuangfj
Copy link
Collaborator

Could you post more error logs?

@TszSimLaw
Copy link
Author

File "train.py", line 1249, in
main()
File "train.py", line 1240, in main
mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 1097, in run
scan_pessimistic_batches_for_oom(
File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 1206, in scan_pessimistic_batches_for_oom
loss, _ = compute_loss(
File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 672, in compute_loss
y = k2.RaggedTensor(y).to(device)
RuntimeError:
Some bad things happened. Please read the above error messages and stack
trace. If you are using Python, the following command may be helpful:

  gdb --args python /path/to/your/code.py

(You can use `gdb` to debug the code. Please consider compiling
a debug version of k2.)

@TszSimLaw
Copy link
Author

Could you post more error logs?

I had used gdb ;
run train.py --world-size 4 --num-epochs 30 --start-epoch 1 --exp-dir pruned_transducer_stateless7_bbpe/exp --max-duration 400
error as above
My torch version is 1.7.1, k2 version is 1.23.4

@csukuangfj
Copy link
Collaborator

File "train.py", line 1249, in main() File "train.py", line 1240, in main mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True) File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 1097, in run scan_pessimistic_batches_for_oom( File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 1206, in scan_pessimistic_batches_for_oom loss, _ = compute_loss( File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 672, in compute_loss y = k2.RaggedTensor(y).to(device) RuntimeError: Some bad things happened. Please read the above error messages and stack trace. If you are using Python, the following command may be helpful:

  gdb --args python /path/to/your/code.py

(You can use `gdb` to debug the code. Please consider compiling
a debug version of k2.)

Could you give even more error logs?

@TszSimLaw
Copy link
Author

File "train.py", line 1249, in main() File "train.py", line 1240, in main mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True) File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:
-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, *args) File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 1097, in run scan_pessimistic_batches_for_oom( File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 1206, in scan_pessimistic_batches_for_oom loss, _ = compute_loss( File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 672, in compute_loss y = k2.RaggedTensor(y).to(device) RuntimeError: Some bad things happened. Please read the above error messages and stack trace. If you are using Python, the following command may be helpful:

  gdb --args python /path/to/your/code.py

(You can use `gdb` to debug the code. Please consider compiling
a debug version of k2.)

Could you give even more error logs?

The training log is as follows :

2023-06-21 10:17:20,038 INFO [train.py:951] (3/4) Training started
2023-06-21 10:17:20,038 INFO [train.py:961] (3/4) Device: cuda:3
2023-06-21 10:17:20,039 INFO [train.py:970] (3/4) {'frame_shift_ms': 10.0, 'allowed_excess_duration_ratio': 0.1, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': False, 'k2-git-sha1': '1031d2c7a6d64fe733283a498495030b6005fb97', 'k2-git-date': 'Thu Feb 23 14:26:54 2023', 'lhotse-version': '1.15.0', 'torch-version': '1.7.1+cu110', 'torch-cuda-available': True, 'torch-cuda-version': '11.0', 'python-version': '3.8', 'icefall-git-branch': None, 'icefall-git-sha1': None, 'icefall-git-date': None, 'icefall-path': '/home/res1/yx_data/nlp/asr_pro', 'k2-path': '/home/h3c/anaconda3/envs/kaldi/lib/python3.8/site-packages/k2/init.py', 'lhotse-path': '/home/h3c/anaconda3/envs/kaldi/lib/python3.8/site-packages/lhotse/init.py', 'hostname': 'h3c-R5300-G5', 'IP address': '127.0.1.1'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('pruned_transducer_stateless7_bbpe/exp'), 'bpe_model': 'data/lang_bbpe_500/bbpe.model', 'base_lr': 0.05, 'lr_batches': 5000, 'lr_epochs': 3.5, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 2000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': False, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 400, 'bucketing_sampler': True, 'num_buckets': 300, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'vocab_size': 500}
2023-06-21 10:17:20,039 INFO [train.py:972] (3/4) About to create model
2023-06-21 10:17:20,550 INFO [zipformer.py:178] (3/4) At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
2023-06-21 10:17:20,562 INFO [train.py:976] (3/4) Number of model parameters: 70369391
2023-06-21 10:17:24,876 INFO [train.py:991] (3/4) Using DDP
2023-06-21 10:17:24,976 INFO [asr_datamodule.py:407] (3/4) About to get train cuts
2023-06-21 10:17:24,977 INFO [train.py:1072] (3/4) Filtering short and long utterances.
2023-06-21 10:17:24,978 INFO [train.py:1075] (3/4) Tokenizing and encoding texts in train cuts.
2023-06-21 10:17:24,978 INFO [asr_datamodule.py:224] (3/4) About to get Musan cuts
2023-06-21 10:17:24,978 INFO [asr_datamodule.py:229] (3/4) Enable MUSAN
2023-06-21 10:17:24,978 INFO [asr_datamodule.py:252] (3/4) Enable SpecAugment
2023-06-21 10:17:24,978 INFO [asr_datamodule.py:253] (3/4) Time warp factor: 80
2023-06-21 10:17:24,978 INFO [asr_datamodule.py:263] (3/4) Num frame mask: 10
2023-06-21 10:17:24,978 INFO [asr_datamodule.py:276] (3/4) About to create train dataset
2023-06-21 10:17:24,978 INFO [asr_datamodule.py:303] (3/4) Using DynamicBucketingSampler.
2023-06-21 10:17:28,180 INFO [asr_datamodule.py:320] (3/4) About to create train dataloader
2023-06-21 10:17:28,181 INFO [asr_datamodule.py:414] (3/4) About to get dev cuts
2023-06-21 10:17:28,181 INFO [train.py:1091] (3/4) Tokenizing and encoding texts in valid cuts.
2023-06-21 10:17:28,181 INFO [asr_datamodule.py:351] (3/4) About to create dev dataset
2023-06-21 10:17:28,401 INFO [asr_datamodule.py:370] (3/4) About to create dev dataloader
2023-06-21 10:17:28,401 INFO [train.py:1198] (3/4) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
2023-06-21 10:29:52,105 INFO [train.py:1176] (3/4) Saving batch to pruned_transducer_stateless7_bbpe/exp/batch-24933b83-7577-50a9-a491-f0b2ea1fca65.pt
2023-06-21 10:29:52,127 INFO [train.py:1182] (3/4) features shape: torch.Size([20, 2000, 80])
2023-06-21 10:29:52,128 INFO [train.py:1186] (3/4) num tokens: 1981

All error logs had been posted

@csukuangfj
Copy link
Collaborator

Could you change

File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 672, in compute_loss
y = k2.RaggedTensor(y).to(device)

to

print(y)
print(device)
y = k2.RaggedTensor(y).to(device)

and post the output?

@TszSimLaw
Copy link
Author

Could you change

File "/home/icefall/egs/tal_csasr/ASR/pruned_transducer_stateless7_bbpe/train.py", line 672, in compute_loss
y = k2.RaggedTensor(y).to(device)

to

print(y)
print(device)
y = k2.RaggedTensor(y).to(device)

and post the output?

ok, as follows

2023-06-21 12:15:43,212 INFO [train.py:1201] (3/4) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
2023-06-21 12:15:43,232 INFO [asr_datamodule.py:370] (1/4) About to create dev dataloader
2023-06-21 12:15:43,232 INFO [train.py:1201] (1/4) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
2023-06-21 12:15:43,268 INFO [asr_datamodule.py:370] (2/4) About to create dev dataloader
2023-06-21 12:15:43,268 INFO [train.py:1201] (2/4) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
[[17, 14, 11, 276, 233, 24, 31, 19, 11, 39, 76, 9, 9, 12, 179, 216, 299, 24, 106, 187, 290, 20, 24, 23, 7, 127, 290, 20, 11, 95, 21, 115, 92, 192, 37, 29, 14, 51, 58, 24, 30, 19, 323, 4, 44, 140, 390, 51, 129, 21, 158, 80, 44, 327, 134, 293, 190, 129, 21, 158, 80, 44, 327, 134, 293, 190, 123, 6, 332, 35, 318, 37, 27, 95, 12, 143, 115, 12, 91, 109, 24, 217, 213, 321, 111, 84, 189, 189, 89, 61, 4, 44, 140, 405, 213, 72, 192, 117, 83, 134, 293, 190, 66, 23, 390, 51, 27], [391, 266, 229, 228, 33, 17, 26, 15, 4, 46, 120, 46, 238, 3, 29, 14, 93, 9, 3, 208, 81, 20, 99, 84, 7, 5, 87, 75, 25, 27, 208, 81, 20, 99, 7, 5, 17, 5, 17, 5, 129, 21, 158, 80, 342, 20, 30, 19, 174, 99, 6, 5, 46, 120, 30, 19, 29, 14, 3, 331, 12, 69, 279, 445, 150, 46, 331, 12, 69, 279, 445, 150, 46, 9, 3, 21, 144, 125, 331, 12, 69, 279, 445, 150, 20, 37, 8, 141, 176, 8, 90, 141, 4, 7, 5, 46, 120], [6, 9, 3, 25, 114, 113, 178, 4, 8, 121, 225, 416, 3, 484, 114, 113, 178, 6, 3, 7, 5, 381, 12, 205, 132, 16, 47, 94, 31, 10, 15, 35, 6, 5, 329, 170, 9, 13, 76, 186, 24, 11, 6, 5, 114, 113, 178, 4, 8, 121, 225, 416, 3, 29, 14, 26, 395, 302, 25, 24, 114, 113, 178, 9, 3, 7, 5, 381, 12, 205, 132, 9, 3, 25, 129, 410, 4, 23, 187, 428, 22, 226, 244, 22, 244, 308, 32, 47, 123, 26, 217, 3, 26, 250, 12, 145, 244, 285, 3, 7, 5, 408, 80, 22, 69, 109, 8, 218, 141, 94, 31, 10, 15, 6, 169, 129, 378, 25, 4, 3, 39, 76, 185, 28, 37, 184, 29, 14, 10, 15, 6, 5, 9, 13, 186, 24], [17, 10, 15, 9, 250, 12, 145, 244, 39, 31, 191, 73, 34, 212, 424, 129, 398, 249, 128, 8, 116, 141, 4, 44, 140, 21, 64, 125, 25, 12, 244, 160, 8, 121, 168, 24, 67, 12, 244, 160, 265, 22, 70, 116, 32, 67, 6, 429, 22, 90, 179, 150, 4, 46, 120, 382, 6, 154, 22, 90, 179, 150, 4, 46, 120, 26, 9, 3, 445, 150, 46, 10, 15, 6, 335, 236, 9, 37, 195, 337, 7, 28, 377, 54, 212, 445, 150, 46, 3, 105, 14, 112], [217, 3, 6, 5, 54, 268, 405, 213, 3, 322, 8, 90, 118, 4, 7, 5, 348, 104, 232, 15, 117, 34, 39, 76, 273, 167, 46, 120, 66, 29, 14, 32, 32, 124, 4, 30, 19, 54, 268, 72, 66, 29, 14, 32, 165, 12, 261, 121, 4, 217, 221, 213, 9, 3, 10, 15, 34, 19, 52, 4, 44, 140, 122, 348, 104, 422, 280, 7, 28, 87, 75, 25, 189, 339, 10, 15, 354, 73, 17, 5, 392, 4, 3, 7, 5, 199, 132, 45, 132, 116, 12, 272, 362, 29, 14, 240, 209, 95, 39, 31, 49, 17, 221, 213, 221, 213, 13, 7, 98, 232, 15, 392, 148, 156, 86, 50, 346, 3, 23, 422, 280, 4, 87, 75, 25], [26, 204, 230, 95, 13, 122, 13, 122, 17, 63, 217, 207, 85, 6, 3, 7, 5, 454, 98, 46, 10, 15, 37, 35, 7, 28, 6, 5, 328, 57, 328, 57, 3, 49, 34, 396, 98, 46, 96, 3, 25, 148, 4, 51, 58, 30, 19, 6, 5, 53, 254, 3, 49, 34, 454, 98, 46, 148, 445, 150, 46, 207, 85, 6, 3, 7, 5, 454, 98, 46, 94, 31, 34, 360, 93, 4, 289, 461, 28, 10, 15, 112, 203, 360, 93, 4, 51, 58, 9, 3, 10, 13, 122, 8, 144, 125, 12, 126, 115, 148, 22, 103, 216, 22, 64, 261], [17, 5, 12, 143, 115, 12, 91, 109, 95, 12, 115, 77, 32, 4, 307, 205, 300, 9, 3, 7, 127, 314, 214, 136, 335, 66, 108, 12, 91, 109, 6, 7, 459, 61, 4, 65, 26, 322, 8, 90, 118, 4, 46, 268, 87, 281, 187, 94, 31, 25, 4, 65, 6, 5, 4, 65, 42, 405, 213, 87, 281, 427, 282, 306, 76, 299, 292, 13, 83, 6, 159, 159, 12, 92, 118, 6, 326, 5, 89, 4, 300, 201, 12, 220, 153, 95, 12, 115, 77, 32, 4, 12, 115, 133, 21, 111, 158, 17, 5, 209, 268, 37, 4, 221, 213, 395, 302, 12, 144, 194, 73, 7, 104, 149, 12, 176, 64, 24], [235, 110, 42, 11, 35, 6, 14, 482, 95, 35, 73, 19, 52, 24, 10, 10, 15, 159, 52, 60, 66, 124, 233, 48, 6, 5, 20, 189, 339, 13, 3, 25, 73, 25, 6, 154, 311, 6, 127, 210, 11, 199, 101, 4, 6, 127, 95, 3, 3, 9, 3, 3, 108, 131, 168, 336, 4, 163, 238, 9, 3, 131, 168, 336, 412, 12, 90, 287, 163, 238, 247, 149, 247, 149, 364, 147, 163, 238, 355, 16, 13, 16], [16, 94, 31, 13, 98, 238, 6, 5, 46, 120, 26, 89, 24, 344, 54, 27, 17, 22, 138, 151, 30, 6, 5, 59, 33, 197, 26, 3, 81, 20, 217, 3, 26, 13, 7, 98, 3, 404, 54, 26, 66, 23, 34, 404, 54, 4, 53, 497, 151, 287, 21, 151, 69, 84, 26, 31, 221, 63, 163, 238, 192, 117, 24, 10, 15, 108, 6, 93, 4, 81, 20, 333, 300, 404, 54, 81, 20, 106, 201, 385, 4, 60, 23, 280, 4, 27, 87, 75, 25, 425, 20, 19, 52, 4, 81, 20, 7, 258, 95, 3, 300, 404, 54, 81, 20, 27], [124, 7, 28, 358, 81, 20, 4, 329, 250, 429, 16, 13, 16, 358, 81, 20, 23, 329, 250, 429, 357, 8, 70, 136, 7, 28, 17, 329, 250, 429, 56, 7, 429, 3, 29, 14, 193, 16, 13, 16, 56, 7, 429, 3, 208, 81, 20, 17, 14, 56, 211, 429, 9, 3, 42, 56, 211, 429, 3, 13, 3, 9, 3, 368, 8, 69, 243, 81, 20, 8, 176, 145, 302, 201, 192, 117, 4, 368, 8, 69, 243, 81, 20], [104, 104, 104, 17, 10, 15, 60, 3, 249, 70, 388, 120, 37, 25, 366, 26, 25, 6, 5, 8, 262, 152, 12, 152, 90, 22, 115, 194, 105, 14, 93, 469, 43, 303, 33, 55, 36, 43, 74, 55, 107, 270, 171, 18, 41, 43, 74, 55, 107, 270, 171, 18, 41, 4, 51, 58, 9, 3, 7, 12, 152, 90, 22, 115, 194, 17, 469, 43, 303, 33, 55, 36, 9, 3, 29, 14, 29, 14, 105, 14, 93, 9, 150, 11, 25, 398, 8, 125, 218, 12, 200, 200, 12, 64, 91, 105, 14, 93, 27, 9, 3], [16, 16, 106, 32, 8, 115, 111, 249, 248, 245, 12, 218, 176, 11, 95, 39, 31, 260, 6, 150, 150, 6, 5, 137, 88, 366, 87, 75, 25, 463, 4, 8, 145, 64, 8, 145, 64, 78, 24, 131, 248, 8, 128, 126, 169, 72, 39, 31, 366, 32, 10, 273, 167, 11, 12, 143, 115, 12, 91, 109, 395, 302, 106, 32, 24, 10, 15, 60, 3, 7, 318, 37, 35, 7, 28, 47, 42, 365, 177, 3, 6, 5, 7, 258, 445, 150, 46, 29, 14, 3, 7, 258, 445, 150, 46, 9, 3, 39, 31, 49, 391, 266, 229, 228, 33, 357, 386, 4, 150, 46], [26, 260, 298, 26, 430, 5, 20, 9, 25, 11, 252, 259, 166, 22, 168, 115, 30, 19, 27, 320, 21, 138, 151, 29, 29, 14, 17, 10, 15, 11, 35, 282, 306, 8, 200, 168, 21, 151, 69, 252, 199, 275, 3, 13, 3, 25, 27, 34, 8, 116, 160, 131, 126, 242, 159, 11, 38, 108, 26, 233, 149, 347, 34, 326, 8, 91, 103, 242, 159, 11, 38, 108, 26, 233, 149, 347, 34, 29, 14, 29, 14, 242, 159, 245, 5, 307, 97, 12, 92, 118, 9, 49, 245, 5, 425, 20, 34, 29, 14, 242, 159], [26, 3, 7, 5, 29, 14, 20, 425, 20, 16, 67, 34, 29, 14, 242, 19, 10, 15, 25, 24, 35, 7, 5, 8, 64, 130, 12, 262, 126, 7, 5, 20, 4, 20, 371, 26, 3, 7, 5, 29, 14, 20, 3, 7, 5, 175, 20, 163, 253, 20, 229, 230, 3, 433, 20, 229, 230, 3, 425, 20, 26, 4, 20, 371, 10, 15, 3, 35, 6, 5, 20, 26, 19, 52, 122, 210, 11, 7, 5, 87, 75, 25, 6, 96, 4, 155, 310, 183, 155, 239, 36], [53, 110, 18, 18, 107, 27, 6, 5, 379, 54, 27, 357, 78, 38, 187, 35, 204, 21, 92, 141, 353, 193, 6, 5, 161, 182, 57, 208, 36, 36, 68, 72, 3, 161, 182, 57, 6, 5, 232, 4, 331, 12, 69, 279, 289, 461, 96, 72, 124, 24, 27, 6, 5, 72, 393, 38, 35, 7, 28, 161, 182, 57, 148, 208, 36, 36, 68, 474, 4, 66, 23, 367, 119, 53, 497, 151, 176, 309, 358, 4, 27], [75, 206, 3, 12, 152, 143, 177, 290, 4, 65, 3, 13, 3, 11, 196, 164, 129, 378, 8, 158, 256, 12, 152, 143, 30, 19, 72, 196, 164, 129, 378, 4, 175, 412, 193, 16, 13, 16, 11, 3, 442, 409, 4, 217, 3, 75, 206, 99, 84, 43, 4, 65, 10, 15, 9, 3, 285, 3, 196, 164, 129, 378, 8, 158, 256, 12, 152, 143, 217, 3, 221, 26, 4, 95, 13, 196, 164, 207, 85, 13, 442, 409, 27, 16, 47, 94, 31, 76, 99, 84, 43, 17, 14, 63, 9, 3, 7, 5, 112, 203, 13, 442, 409, 4, 129, 75, 206, 99, 7, 5, 44, 246, 4, 65, 9, 3, 7, 5, 13, 405, 98, 4, 44, 246], [208, 9, 3, 284, 43, 162, 43, 88, 4, 81, 20, 387, 163, 17, 10, 15, 34, 6, 5, 46, 120, 212, 113, 55, 274, 19, 52, 26, 9, 13, 76, 99, 284, 43, 162, 43, 88, 26, 9, 38, 99, 208, 6, 5, 20, 334, 22, 77, 225, 9, 3, 81, 20, 387, 163, 32, 67, 94, 31, 25, 10, 15, 6, 5, 240, 209, 147, 166, 3], [42, 16, 7, 358, 8, 64, 172, 4, 425, 20, 26, 95, 23, 7, 7, 127, 334, 22, 77, 225, 4, 8, 121, 225, 416, 27, 9, 3, 10, 15, 242, 19, 122, 35, 73, 106, 187, 4, 379, 54, 9, 3, 106, 187, 4, 81, 20, 26, 260, 13, 7, 93, 425, 20, 12, 121, 126, 45, 111, 287, 192, 37, 10, 15, 296, 317, 3, 13, 7, 93, 4, 11, 476, 10, 51, 58, 67, 9, 3, 10, 15, 302, 201, 34, 233, 411, 323, 402, 96, 3, 13, 3, 399, 11, 186, 379, 54, 27], [307, 179, 39, 31, 381, 44, 246, 104, 72, 39, 31, 381, 7, 432, 44, 246, 382, 156, 86, 50, 271, 285, 381, 7, 432, 44, 246, 72, 9, 3, 63, 15, 22, 90, 179, 22, 90, 179, 150, 44, 246, 22, 115, 168, 8, 128, 135, 4, 13, 360, 156, 86, 146, 39, 31, 450, 21, 272, 101, 405, 381, 7, 5, 44, 246, 104, 72, 39, 31, 450, 72, 39, 31, 87, 281, 8, 362, 70, 12, 216, 128, 381, 7, 432, 44, 246, 382, 156, 86, 50, 271, 285, 76, 381, 7, 432, 44, 246], [6, 5, 299, 292, 24, 67, 42, 11, 405, 98, 37, 17, 6, 5, 282, 306, 395, 302, 448, 473, 11, 386, 486, 9, 3, 178, 170, 50, 57, 237, 313, 36, 171, 86, 255, 110, 17, 10, 150, 11, 366, 75, 206, 11, 38, 108, 376, 170, 197, 313, 36, 43, 171, 86, 255, 110]]
cuda:3

@csukuangfj
Copy link
Collaborator

Does it crash after printing?

@TszSimLaw
Copy link
Author

Does it crash after printing?

no crash.

then print the log:

[F] /home/runner/work/k2/k2/k2/csrc/device_guard.h:66:static int32_t k2::DeviceGuard::GetDevice() k2 compiled without CUDA support

[ Stack-Trace: ]
/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/k2/lib/libk2_log.so(k2::internal::GetStackTrace()+0x47) [0x7ffeb6ab7077]
/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0x8439a) [0x7ffeb759e39a]
/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0x116daf) [0x7ffeb7630daf]
/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0x9418d) [0x7ffeb75ae18d]
/home/anaconda3/envs/kaldi/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xecf15) [0x7ffeb7606f15]
/home/anaconda3/envs/kaldi/bin/python(PyCFunction_Call+0x52) [0x4e1072]
/home/anaconda3/envs/kaldi/bin/python(_PyObject_MakeTpCall+0x3eb) [0x4d1f7b]
/home/anaconda3/envs/kaldi/bin/python() [0x4e965b]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalFrameDefault+0x4d48) [0x4ccdd8]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalCodeWithName+0x1f5) [0x4c6f45]
/home/anaconda3/envs/kaldi/bin/python(_PyFunction_Vectorcall+0x19c) [0x4db1ac]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalFrameDefault+0x172c) [0x4c97bc]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalCodeWithName+0x1f5) [0x4c6f45]
/home/anaconda3/envs/kaldi/bin/python(_PyFunction_Vectorcall+0x19c) [0x4db1ac]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalFrameDefault+0x172c) [0x4c97bc]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalCodeWithName+0x1f5) [0x4c6f45]
/home/anaconda3/envs/kaldi/bin/python(_PyFunction_Vectorcall+0x19c) [0x4db1ac]
/home/anaconda3/envs/kaldi/bin/python(PyObject_Call+0x5e) [0x4ed53e]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalFrameDefault+0x1f03) [0x4c9f93]
/home/anaconda3/envs/kaldi/bin/python(_PyFunction_Vectorcall+0x106) [0x4db116]
/home/anaconda3/envs/kaldi/bin/python(PyObject_Call+0x5e) [0x4ed53e]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalFrameDefault+0x1f03) [0x4c9f93]
/home/anaconda3/envs/kaldi/bin/python(_PyFunction_Vectorcall+0x106) [0x4db116]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalFrameDefault+0xa3e) [0x4c8ace]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalCodeWithName+0x1f5) [0x4c6f45]
/home/anaconda3/envs/kaldi/bin/python(_PyFunction_Vectorcall+0x19c) [0x4db1ac]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalFrameDefault+0xa3e) [0x4c8ace]
/home/anaconda3/envs/kaldi/bin/python(_PyFunction_Vectorcall+0x106) [0x4db116]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalFrameDefault+0x907) [0x4c8997]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalCodeWithName+0x1f5) [0x4c6f45]
/home/anaconda3/envs/kaldi/bin/python(_PyFunction_Vectorcall+0x19c) [0x4db1ac]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalFrameDefault+0x172c) [0x4c97bc]
/home/anaconda3/envs/kaldi/bin/python(_PyEval_EvalCodeWithName+0x1f5) [0x4c6f45]
/home/anaconda3/envs/kaldi/bin/python(PyEval_EvalCodeEx+0x39) [0x4c6d49]
/home/anaconda3/envs/kaldi/bin/python(PyEval_EvalCode+0x1b) [0x56d7eb]
/home/anaconda3/envs/kaldi/bin/python() [0x58cb21]
/home/anaconda3/envs/kaldi/bin/python() [0x5868df]
/home/anaconda3/envs/kaldi/bin/python(PyRun_StringFlags+0x7b) [0x584eab]
/home/anaconda3/envs/kaldi/bin/python(PyRun_SimpleStringFlags+0x3b) [0x584d8b]
/home/anaconda3/envs/kaldi/bin/python(Py_RunMain+0x15b) [0x583f7b]
/home/anaconda3/envs/kaldi/bin/python(Py_BytesMain+0x39) [0x5618a9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7ffff703fc87]
/home//anaconda3/envs/kaldi/bin/python() [0x56175e]

2023-06-21 12:21:57,449 INFO [train.py:1179] (3/4) Saving batch to pruned_transducer_stateless7_bbpe/exp/batch-24933b83-7577-50a9-a491-f0b2ea1fca65.pt
2023-06-21 12:21:57,469 INFO [train.py:1185] (3/4) features shape: torch.Size([20, 2000, 80])
2023-06-21 12:21:57,470 INFO [train.py:1189] (3/4) num tokens: 1981

@csukuangfj
Copy link
Collaborator

[F] /home/runner/work/k2/k2/k2/csrc/device_guard.h:66:static int32_t k2::DeviceGuard::GetDevice() k2 compiled without CUDA support

You are using a CPU version of k2. Please install a CUDA version.

Please follow the documentation to check that you have indeed installed a CUDA version by running

python3 -m k2.version

@TszSimLaw
Copy link
Author

[F] /home/runner/work/k2/k2/k2/csrc/device_guard.h:66:static int32_t k2::DeviceGuard::GetDevice() k2 compiled without CUDA support

You are using a CPU version of k2. Please install a CUDA version.

Please follow the documentation to check that you have indeed installed a CUDA version by running

python3 -m k2.version

Many thanks,i'll try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants