-
-
Notifications
You must be signed in to change notification settings - Fork 344
Open
Labels
bugSomething isn't workingSomething isn't working
Description
🐛 Bug
When i generate the same speech sample using the following code, baya sounds like there are 2 people speaking at the same time but other speakers dont (tested on xenia). However, generating this text with the telegram bot sounds fine.
# V4
import torch
import torchaudio
import soundfile as sf
import numpy as np
language = 'ru'
model_id = 'v4_ru'
sample_rate = 48000
speaker = 'baya'
device = torch.device('cpu')
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language=language,
speaker=model_id)
model.to(device) # gpu or cpu
audio = model.apply_tts(text="Добро пожаловать в компьютизированный экспериментальный центр при лаборатории исследования природы порталов.",
speaker=speaker,
sample_rate=sample_rate)
sf.write('test_baya.wav', audio.numpy(), sample_rate)
Here are the samples from baya and xenia:
samples.zip
To Reproduce
Steps to reproduce the behavior:
- run code from above
Expected behavior
sound ok
Environment
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Home Single Language
GCC version: (x86_64-win32-seh-rev1, Built by MinGW-Builds project) 13.2.0
Clang version: 18.1.4
CMake version: version 3.29.2
Libc version: N/A
Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 556.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=2500
DeviceID=CPU0
Family=205
L2CacheSize=9216
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2500
Name=12th Gen Intel(R) Core(TM) i5-12500H
ProcessorType=3
Revision=
Versions of relevant libraries:
[pip3] flake8==7.1.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.25.2
[pip3] onnxruntime==1.18.1
[pip3] torch==2.3.1+cu121
[pip3] torchaudio==2.3.1
[pip3] torchvision==0.18.1+cu121
[conda] Could not collect
Additional context
nope
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working