Skip to content

[EDGE CASE] Cartoon voice and voice-line musical instrument worse performance on v5 than older version #563

@George0828Zhang

Description

@George0828Zhang

🐛 Bug

V5 ignores cartoon voices.

To Reproduce

Steps to reproduce the behavior:

  1. Using colab example
  2. Download this example and run until this cell (change 'en_example.wav' to 'ja_example.wav'):
wav = read_audio('ja_example.wav', sampling_rate=SAMPLING_RATE)
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE)
pprint(speech_timestamps)
  1. The result is:
[{'end': 30464, 'start': 12032}]

while if old version is used (see SYSTRAN/faster-whisper#934 (comment)), the result is

[{'end': 40192, 'start': 12032},
 {'end': 179456, 'start': 76544},
 {'end': 379136, 'start': 273152},
 {'end': 457984, 'start': 422656},
 {'end': 630016, 'start': 576256},
 {'end': 669952, 'start': 653056},
 {'end': 863488, 'start': 695040},
 {'end': 950528, 'start': 896768}]

Expected behavior

V5 should be better than older version.

Environment

Please copy and paste the output from this
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions