Streaming zipformer tensorrt support #681

francois-vz · 2024-12-11T12:56:38Z

Is the zipformer model supported for streaming with tensorrt? I have not been able to get it up and running on the latest branch of sherpa and icefall.

I could get the streaming zipformer up and running with onnx with

triton-k2:24.07
build_librispeech_pruned_transducer_stateless7_streaming.sh

with a small edit to export.py where the the bpe model is being passed but the script expects the tokens. I could perform inference through that but whenever I attempt to export the zipformer to TensorRT I get the following error:

[12/11/2024-15:02:18] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [api_compile.cpp:validate_copy_operation:4258] Slice operation "/Pad_13_slice" has incorrect fill value type, slice op requires fill value type to be same as its input.
[12/11/2024-15:02:19] [E] Error[10]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/Concat_270.../Transpose_185]}.)
[12/11/2024-15:02:19] [E] Engine could not be created from network
[12/11/2024-15:02:19] [E] Building engine failed
[12/11/2024-15:02:19] [E] Failed to create engine from model or file.
[12/11/2024-15:02:19] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100200] # /usr/src/tensorrt/bin/trtexec --onnx=/workspace/sherpa/triton/model_repo_streaming_zipformer/encoder/1/encoder.poly.onnx --minShapes=x:1x16x80,x_lens:1 --optShapes=x:4x512x80,x_lens:4 --maxShapes=x:16x2000x80,x_lens:16 --fp16 --loadInputs=x:scripts/test_features/input_tensor_fp32.dat,x_lens:scripts/test_features/shape.bin --shapes=x:1x663x80,x_lens:1 --saveEngine=/workspace/sherpa/triton/model_repo_streaming_zipformer/encoder/1/encoder.trt

So I guess I just wanted to confirm whether TensorRT is really supported for streaming Zipformers?

The text was updated successfully, but these errors were encountered:

yuekaizhang · 2024-12-17T09:25:57Z

@francois-vz Sorry, we have only verified that offline zipformer works for tensorrt https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/build_wenetspeech_zipformer_offline_trt.sh.

For streaming Zipformer, it can be supported with TensorRT, although there are some issues as you mentioned. However, I currently don't have the time to address this. If anyone has the bandwidth, I'm happy to offer assistance.

yuekaizhang · 2024-12-17T09:33:14Z

with a small edit to export.py

Would you mind making a PR to fix it if you have some time? Thanks.

yuekaizhang mentioned this issue Dec 17, 2024

Excessive Input Values in streaming zipformer encoder After Conversion to onnx #679

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming zipformer tensorrt support #681

Streaming zipformer tensorrt support #681

francois-vz commented Dec 11, 2024 •

edited

Loading

yuekaizhang commented Dec 17, 2024

yuekaizhang commented Dec 17, 2024

Streaming zipformer tensorrt support #681

Streaming zipformer tensorrt support #681

Comments

francois-vz commented Dec 11, 2024 • edited Loading

yuekaizhang commented Dec 17, 2024

yuekaizhang commented Dec 17, 2024

francois-vz commented Dec 11, 2024 •

edited

Loading