Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming zipformer tensorrt support #681

Open
francois-vz opened this issue Dec 11, 2024 · 2 comments
Open

Streaming zipformer tensorrt support #681

francois-vz opened this issue Dec 11, 2024 · 2 comments

Comments

@francois-vz
Copy link

francois-vz commented Dec 11, 2024

Is the zipformer model supported for streaming with tensorrt? I have not been able to get it up and running on the latest branch of sherpa and icefall.

I could get the streaming zipformer up and running with onnx with

  • triton-k2:24.07
  • build_librispeech_pruned_transducer_stateless7_streaming.sh

with a small edit to export.py where the the bpe model is being passed but the script expects the tokens. I could perform inference through that but whenever I attempt to export the zipformer to TensorRT I get the following error:

[12/11/2024-15:02:18] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [api_compile.cpp:validate_copy_operation:4258] Slice operation "/Pad_13_slice" has incorrect fill value type, slice op requires fill value type to be same as its input.
[12/11/2024-15:02:19] [E] Error[10]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/Concat_270.../Transpose_185]}.)
[12/11/2024-15:02:19] [E] Engine could not be created from network
[12/11/2024-15:02:19] [E] Building engine failed
[12/11/2024-15:02:19] [E] Failed to create engine from model or file.
[12/11/2024-15:02:19] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100200] # /usr/src/tensorrt/bin/trtexec --onnx=/workspace/sherpa/triton/model_repo_streaming_zipformer/encoder/1/encoder.poly.onnx --minShapes=x:1x16x80,x_lens:1 --optShapes=x:4x512x80,x_lens:4 --maxShapes=x:16x2000x80,x_lens:16 --fp16 --loadInputs=x:scripts/test_features/input_tensor_fp32.dat,x_lens:scripts/test_features/shape.bin --shapes=x:1x663x80,x_lens:1 --saveEngine=/workspace/sherpa/triton/model_repo_streaming_zipformer/encoder/1/encoder.trt

So I guess I just wanted to confirm whether TensorRT is really supported for streaming Zipformers?

@yuekaizhang
Copy link
Collaborator

@francois-vz Sorry, we have only verified that offline zipformer works for tensorrt https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/build_wenetspeech_zipformer_offline_trt.sh.

For streaming Zipformer, it can be supported with TensorRT, although there are some issues as you mentioned. However, I currently don't have the time to address this. If anyone has the bandwidth, I'm happy to offer assistance.

@yuekaizhang
Copy link
Collaborator

with a small edit to export.py

Would you mind making a PR to fix it if you have some time? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants