What about int8 weighs? #608

AntonThai2022 · 2024-06-11T15:32:57Z

hello!
I build int8 weights:
INFERENCE_PRECISION=float16
WEIGHT_ONLY_PRECISION=int8
MAX_BEAM_WIDTH=4
MAX_BATCH_SIZE=8
checkpoint_dir=whisper_large_v3_weights_${WEIGHT_ONLY_PRECISION}
output_dir=whisper_large_v3_${WEIGHT_ONLY_PRECISION}

Convert the large-v3 model weights into TensorRT-LLM format.

python3 convert_checkpoint.py
--use_weight_only
--weight_only_precision $WEIGHT_ONLY_PRECISION
--output_dir $checkpoint_dir

So I got whisper_large_v3_weights_int8 and put it in sherpa/triton/whisper/model_repo_whisper_trtllm/whisper/1. But it does not work.
I tried to change name on whisper_large_v3, but it did not help)))
Is it real to launch int8 whisper in your repo and docker image?

csukuangfj · 2024-06-12T02:02:33Z

@yuekaizhang Could you have a look? Thanks!

yuekaizhang · 2024-06-12T02:04:42Z

hello! I build int8 weights: INFERENCE_PRECISION=float16 WEIGHT_ONLY_PRECISION=int8 MAX_BEAM_WIDTH=4 MAX_BATCH_SIZE=8 checkpoint_dir=whisper_large_v3_weights_${WEIGHT_ONLY_PRECISION} output_dir=whisper_large_v3_${WEIGHT_ONLY_PRECISION}

Convert the large-v3 model weights into TensorRT-LLM format.

python3 convert_checkpoint.py --use_weight_only --weight_only_precision $WEIGHT_ONLY_PRECISION --output_dir $checkpoint_dir

So I got whisper_large_v3_weights_int8 and put it in sherpa/triton/whisper/model_repo_whisper_trtllm/whisper/1. But it does not work. I tried to change name on whisper_large_v3, but it did not help))) Is it real to launch int8 whisper in your repo and docker image?

@AntonThai2022 Would you mind pasting the error logs here?

AntonThai2022 · 2024-06-12T06:00:51Z

I apologize for the created topic - I just mixed up the folder with intermediate weights and engine weights. I sat down and took the desired folder and renamed it - everything worked. It’s strange that the regular version takes 8GB, and 8 bit 7GB. Although the acceleration was almost doubled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What about int8 weighs? #608

What about int8 weighs? #608

AntonThai2022 commented Jun 11, 2024

csukuangfj commented Jun 12, 2024

yuekaizhang commented Jun 12, 2024

Convert the large-v3 model weights into TensorRT-LLM format.

AntonThai2022 commented Jun 12, 2024

What about int8 weighs? #608

What about int8 weighs? #608

Comments

AntonThai2022 commented Jun 11, 2024

Convert the large-v3 model weights into TensorRT-LLM format.

csukuangfj commented Jun 12, 2024

yuekaizhang commented Jun 12, 2024

Convert the large-v3 model weights into TensorRT-LLM format.

AntonThai2022 commented Jun 12, 2024