Few domain related terminologies are not transcribed correctly in whisper-triton. #649

krishnardt · 2024-09-19T03:09:39Z

Hi,

Is there any way to correct above mentioned examples while transcribing through whisper-triton?

Model is not able to transcribe few words properly even though spelt normally.

For example: Atomberg is transcribed as "Atombuck".

I tried to add custom tokens to the tokeniser(tiktoken) by modifying its tokenizer.py code as in below image, without disturbing the flow. but I am getting worst output compared to without custom token.

I followed K2 Sherpa's approach to generate the model and ran the triton server.

Can someone guide me how to resolve this issue?

csukuangfj · 2024-09-19T04:43:27Z

@yuekaizhang Could you have a look?

yuekaizhang · 2024-09-19T06:06:42Z

Hi @krishnardt, whisper-triton is an acclerated solution which can't improve whisper's accuracy. If you can't get correct results using pytorch whisper implementation, whisper-triton can't help either.

yuekaizhang · 2024-09-19T06:09:17Z

Try <|startofprev|>Hotwords: Atomberg<|startoftranscript|><|en|><|transcribe|><|notimestamps|> as text prefix to see if it could work.

krishnardt · 2024-09-19T07:05:42Z

@yuekaizhang I edited in the wrong place.

I got the ouptut correctly...

I have few other hot words.. Added them as comma seperated values. It is working fine.

But won't it increase the latency?
Do we have any other way to add hotwords during starting the server instead of inference? because currently , the model is accepting only 30 secs data as input per request.

I tried to for 2 mins data, that means for 4 requests, the prefix would be added and it the hotwords list is bigger, it may increase the latency.

This is what I am thinking. Please correct if I am wrong.

krishnardt · 2024-12-17T10:34:29Z

Hi,
I tried to give more hot words, the requests are failing with below issue. Can anyone provide the solution?

during conversion I gave input words/tokens as 340.
and the hotwords count as 80.

server error:
[12/17/2024-10:29:37] [TRT] [E] IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter check failed, condition: inputDimensionSpecified && inputShapesSpecified. Not all shapes are specified. Following input tensors' dimensions are not specified: input_ids, position_ids, cache_indirection, past_key_value_0, past_key_value_1, past_key_value_2, past_key_value_3, past_key_value_4, past_key_value_5, past_key_value_6, past_key_value_7, past_key_value_8, past_key_value_9, past_key_value_10, past_key_value_11, past_k

inference error:
tritonclient.utils.InferenceServerException: [500] Failed to process the request(s) for model instance 'whisper_0_0', message: RuntimeError: Executing TRT engine failed step=0!

At:
/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py(2748): handle_per_step
/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py(3089): decode_regular
/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py(3481): decode
/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py(969): wrapper
/workspace/./model_repo_whisper_trtllm/whisper/1/whisper_trtllm.py(166): generate
/workspace/./model_repo_whisper_trtllm/whisper/1/whisper_trtllm.py(203): process_batch
/workspace/./model_repo_whisper_trtllm/whisper/1/model.py(108): execute

^CTraceback (most recent call last):
File "/data/opt/miniconda/envs/venv-whisper-stt/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data/opt/miniconda/envs/venv-whisper-stt/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data/krishnas/repos/speech_stt_original/speech-to-text-service/src/serve.py", line 28, in
asyncio.get_event_loop().run_until_complete(serve())
File "/data/opt/miniconda/envs/venv-whisper-stt/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
self.run_forever()
File "/data/opt/miniconda/envs/venv-whisper-stt/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/data/opt/miniconda/envs/venv-whisper-stt/lib/python3.9/asyncio/base_events.py", line 1869, in _run_once
event_list = self._selector.select(timeout)
File "/data/opt/miniconda/envs/venv-whisper-stt/lib/python3.9/selectors.py", line 469, in select
fd_event_list = self._selector.poll(timeout, max_ev)

yuekaizhang · 2024-12-17T10:51:25Z

@krishnardt
See https://huggingface.co/openai/whisper-large-v3/blob/main/config.json#L36. You may change your build commands for decoder.

trtllm-build  --checkpoint_dir ${checkpoint_dir}/decoder \
              --output_dir ${output_dir}/decoder \
              --moe_plugin disable \
              --max_beam_width ${MAX_BEAM_WIDTH} \
              --max_batch_size ${MAX_BATCH_SIZE} \
              --max_seq_len 448 \
              --max_input_len 400 \
              --max_encoder_input_len 3000 \
              --gemm_plugin ${INFERENCE_PRECISION} \
              --bert_attention_plugin ${INFERENCE_PRECISION} \
              --gpt_attention_plugin ${INFERENCE_PRECISION}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Few domain related terminologies are not transcribed correctly in whisper-triton. #649

Few domain related terminologies are not transcribed correctly in whisper-triton. #649

krishnardt commented Sep 19, 2024 •

edited

Loading

csukuangfj commented Sep 19, 2024

yuekaizhang commented Sep 19, 2024

yuekaizhang commented Sep 19, 2024

krishnardt commented Sep 19, 2024 •

edited

Loading

krishnardt commented Dec 17, 2024

yuekaizhang commented Dec 17, 2024

Few domain related terminologies are not transcribed correctly in whisper-triton. #649

Few domain related terminologies are not transcribed correctly in whisper-triton. #649

Comments

krishnardt commented Sep 19, 2024 • edited Loading

Is there any way to correct above mentioned examples while transcribing through whisper-triton?

csukuangfj commented Sep 19, 2024

yuekaizhang commented Sep 19, 2024

yuekaizhang commented Sep 19, 2024

krishnardt commented Sep 19, 2024 • edited Loading

krishnardt commented Dec 17, 2024

yuekaizhang commented Dec 17, 2024

krishnardt commented Sep 19, 2024 •

edited

Loading

krishnardt commented Sep 19, 2024 •

edited

Loading