[EDGE CASE] Cartoon voice and voice-line musical instrument worse performance on v5 than older version

## 🐛 Bug

V5 ignores cartoon voices.

## To Reproduce

Steps to reproduce the behavior:

1. Using [colab example](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb)
2. Download this [example](https://drive.google.com/file/d/1NPvEybP0VU1dFmd6neH6JJRW_Qm2MXdk/view?usp=sharing) and run until this cell (change 'en_example.wav' to 'ja_example.wav'):
```python
wav = read_audio('ja_example.wav', sampling_rate=SAMPLING_RATE)
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE)
pprint(speech_timestamps)
```
3. The result is:
```
[{'end': 30464, 'start': 12032}]
```
while if old version is used (see https://github.com/SYSTRAN/faster-whisper/issues/934#issuecomment-2439340290), the result is
```
[{'end': 40192, 'start': 12032},
 {'end': 179456, 'start': 76544},
 {'end': 379136, 'start': 273152},
 {'end': 457984, 'start': 422656},
 {'end': 630016, 'start': 576256},
 {'end': 669952, 'start': 653056},
 {'end': 863488, 'start': 695040},
 {'end': 950528, 'start': 896768}]
```

## Expected behavior

V5 should be better than older version.

## Environment

Please copy and paste the output from this 
[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
(or fill out the checklist below manually).

You can get the script and run it with:
```
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
```

 - PyTorch Version (e.g., 1.0):
 - OS (e.g., Linux):
 - How you installed PyTorch (`conda`, `pip`, source):
 - Build command you used (if compiling from source):
 - Python version:
 - CUDA/cuDNN version:
 - GPU models and configuration:
 - Any other relevant information:

## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[EDGE CASE] Cartoon voice and voice-line musical instrument worse performance on v5 than older version #563

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[EDGE CASE] Cartoon voice and voice-line musical instrument worse performance on v5 than older version #563

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions