Skip to content

Bug report - [Baya ru speaker (v4) sounds weird] #281

@JaanDev

Description

@JaanDev

🐛 Bug

When i generate the same speech sample using the following code, baya sounds like there are 2 people speaking at the same time but other speakers dont (tested on xenia). However, generating this text with the telegram bot sounds fine.

# V4
import torch
import torchaudio
import soundfile as sf
import numpy as np

language = 'ru'
model_id = 'v4_ru'
sample_rate = 48000
speaker = 'baya'
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=model_id)
model.to(device)  # gpu or cpu

audio = model.apply_tts(text="Добро пожаловать в компьютизированный экспериментальный центр при лаборатории исследования природы порталов.",
                        speaker=speaker,
                        sample_rate=sample_rate)

sf.write('test_baya.wav', audio.numpy(), sample_rate)

Here are the samples from baya and xenia:
samples.zip

To Reproduce

Steps to reproduce the behavior:

  1. run code from above

Expected behavior

sound ok

Environment

Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home Single Language
GCC version: (x86_64-win32-seh-rev1, Built by MinGW-Builds project) 13.2.0
Clang version: 18.1.4
CMake version: version 3.29.2
Libc version: N/A

Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 556.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2500
DeviceID=CPU0
Family=205
L2CacheSize=9216
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2500
Name=12th Gen Intel(R) Core(TM) i5-12500H
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] flake8==7.1.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.25.2
[pip3] onnxruntime==1.18.1
[pip3] torch==2.3.1+cu121
[pip3] torchaudio==2.3.1
[pip3] torchvision==0.18.1+cu121
[conda] Could not collect

Additional context

nope

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions