Excessive Input Values in streaming zipformer encoder After Conversion to onnx #679

renadnasser1 · 2024-12-10T12:47:42Z

First, thank you for all your hard work on icefall and sherpa—they've been incredible resources!
We encountered an issue after converting a trained checkpoint for a streaming Zipformer-based ASR model to ONNX format using the conversion script: export-onnx-streaming.py. The conversion script successfully generated a 3 onnx files (encoder, decoder and joiner). however, the encoder generated with 99 input_values, including (x, x_lens).
During deployment to Triton, we faced the following challenge:
We needed to write the config.pbtxt file. To streamline this, we referred to the scripts available in sherpa/triton/scripts for building configs. Unfortunately, there doesn't appear to be a script specifically for a streaming Zipformer-based model.

To proceed, we used the sherpa/triton/model_repo_streaming_zipformer directory as a reference for all components (feature_extractor, encoder, decoder, joiner, scorer). However, when running Triton, the model configuration expects 2 input_values, while the ONNX model provides 99 input_values.

Could you clarify the following:

Is there an existing script to generate the config.pbtxt for a streaming Zipformer-based model?
If not, could you provide guidance or share a sample configuration that matches the expected input/output structure for this model?

Your insights would be immensely helpful, and I'd be happy to provide additional details if needed.
Thanks in advance for your support!

csukuangfj · 2024-12-11T06:21:55Z

@yuekaizhang could you have a look at this issue?

the model configuration expects 2 input_values, while the ONNX model provides 99 input_values.

I think it is easy to use a script to update the config.pbtxt when exporting the model to onnx to include the inputs for model states.

yuekaizhang · 2024-12-11T07:04:48Z

@renadnasser1
I'm sorry for the confusion onnx models.
The ONNX used by sherpa-onnx and sherpa/triton may not always be compatible, primarily due to minor differences in input and output shapes.
For zipformer streaming, we currently do not have a one-click Triton deployment example, but you can refer to the deployment scripts for conformer streaming or pruned stateless 7 streaming. Please check the scripts under sherpa/triton/scripts that have _streaming.sh in their names. Basically, we manually wrapped the state tensor as seen in this line: https://github.com/k2-fsa/sherpa/blob/master/triton/scripts/export_onnx.py#L274.
Another approach, as @csukuangfj suggested, is to manually modify the config.pbtxt using a script. If you understand how to use the Triton ONNX backend, either of these methods will work.
If you have any questions, please feel free to ask me.

vasistalodagala · 2024-12-15T11:13:11Z

Hi @yuekaizhang and @csukuangfj ,

The model I've built is a streaming zipformer model. The goal is to deploy it using triton.

Following are the methods I've used to export it into the .onnx format:

The outcomes of these trials are:

The exported .onnx models could be used successfully for inference using https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/onnx_pretrained-streaming.py
The export to .onnx failed when using the --causal True. And when --causal was set to False, while the export worked fine, the inference gives empty output when using https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/zipformer/onnx_pretrained.py. This should be because the streaming model expects some cached input which isn't available. Also, this is expected behaviour anyways from my understanding.
The export to .onnx fails due to mismatch in the kind of classes from the scaling.py file used in zipformer and pruned_transducer_stateless3.

Could you please provide with the exact way to export to .onnx for the streaming zipformer which can then be used in the triton deployment?
Also, request you to provide with the config.pbtxt file for triton deployment of streaming zipformer model. Generating the configuration for streaming zipformer from the scripts under sherpa/triton/scripts that have _streaming.sh wasn't quite intuitive/direct. Thanks.

yuekaizhang · 2024-12-17T09:32:00Z

Could you please provide with the exact way to export to .onnx for the streaming zipformer which can then be used in the triton deployment?

You may try build_librispeech_pruned_transducer_stateless7_streaming.sh first, since it is a similar model comparing with streaming zipformer. In #681, it could work.

Also, request you to provide with the config.pbtxt file for triton deployment of streaming zipformer model. Generating the configuration for streaming zipformer from the scripts under sherpa/triton/scripts that have _streaming.sh wasn't quite intuitive/direct. Thanks.

Sorry, I have no slot support recently; I would be very grateful if someone could contribute build_librispeech_zipformer_streaming.sh.
My suggestion is to run build_librispeech_pruned_transducer_stateless7_streaming.sh, and then modify it accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive Input Values in streaming zipformer encoder After Conversion to onnx #679

Excessive Input Values in streaming zipformer encoder After Conversion to onnx #679

renadnasser1 commented Dec 10, 2024

csukuangfj commented Dec 11, 2024

yuekaizhang commented Dec 11, 2024

vasistalodagala commented Dec 15, 2024

yuekaizhang commented Dec 17, 2024

Excessive Input Values in streaming zipformer encoder After Conversion to onnx #679

Excessive Input Values in streaming zipformer encoder After Conversion to onnx #679

Comments

renadnasser1 commented Dec 10, 2024

csukuangfj commented Dec 11, 2024

yuekaizhang commented Dec 11, 2024

vasistalodagala commented Dec 15, 2024

yuekaizhang commented Dec 17, 2024