Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug report - [Baya ru speaker (v4) sounds weird] #281

Open
JaanDev opened this issue Jul 7, 2024 · 0 comments
Open

Bug report - [Baya ru speaker (v4) sounds weird] #281

JaanDev opened this issue Jul 7, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@JaanDev
Copy link

JaanDev commented Jul 7, 2024

🐛 Bug

When i generate the same speech sample using the following code, baya sounds like there are 2 people speaking at the same time but other speakers dont (tested on xenia). However, generating this text with the telegram bot sounds fine.

# V4
import torch
import torchaudio
import soundfile as sf
import numpy as np

language = 'ru'
model_id = 'v4_ru'
sample_rate = 48000
speaker = 'baya'
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=model_id)
model.to(device)  # gpu or cpu

audio = model.apply_tts(text="Добро пожаловать в компьютизированный экспериментальный центр при лаборатории исследования природы порталов.",
                        speaker=speaker,
                        sample_rate=sample_rate)

sf.write('test_baya.wav', audio.numpy(), sample_rate)

Here are the samples from baya and xenia:
samples.zip

To Reproduce

Steps to reproduce the behavior:

  1. run code from above

Expected behavior

sound ok

Environment

Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home Single Language
GCC version: (x86_64-win32-seh-rev1, Built by MinGW-Builds project) 13.2.0
Clang version: 18.1.4
CMake version: version 3.29.2
Libc version: N/A

Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 556.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2500
DeviceID=CPU0
Family=205
L2CacheSize=9216
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2500
Name=12th Gen Intel(R) Core(TM) i5-12500H
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] flake8==7.1.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.25.2
[pip3] onnxruntime==1.18.1
[pip3] torch==2.3.1+cu121
[pip3] torchaudio==2.3.1
[pip3] torchvision==0.18.1+cu121
[conda] Could not collect

Additional context

nope

@JaanDev JaanDev added the bug Something isn't working label Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants