Generalising static quantization implementation of Calibrationdatareader for quantization for different architectures #19538
manickavela29
started this conversation in
Ideas / Feature Requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently working with Zipformer2 (based on conformers) models from icefall, the model has its input states from previous inference output(except for first inference as all are zeros).
Zipformer is a Transformer based ASR model, with major compute ops of Matmul and Conv(depth and pointwise)
Model : Zipformer2
Inputs : X(feature vector), and States -> a list of 96 states(initialised as 0).
output: Y, and list of 96 states (these output states are maintained separately and fed as input for next output)
Current implementation for Static Quantization function quant_static() takes in fp32 model and calibrationdatsetreader
and does the quantization but, this implementation does not support models like zip formers where in input and output are dependent on states
Beta Was this translation helpful? Give feedback.
All reactions