Skip to content

Commit

Permalink
Omit space prefix in initial_prompt for spaceless languages.
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanheise committed Mar 31, 2024
1 parent ba3f3cd commit 21999e1
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion whisper/transcribe.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,8 @@ def decode_with_fallback(segment: torch.Tensor) -> DecodingResult:
prompt_reset_since = 0

if initial_prompt is not None:
initial_prompt_tokens = tokenizer.encode(" " + initial_prompt.strip())
space = "" if language in {"zh", "ja", "th", "lo", "my", "yue"} else " "
initial_prompt_tokens = tokenizer.encode(space + initial_prompt.strip())
all_tokens.extend(initial_prompt_tokens)
else:
initial_prompt_tokens = []
Expand Down

0 comments on commit 21999e1

Please sign in to comment.