Skip to content

zero shot拖音,音频拉长问题 #1550

@wangyizhi1

Description

@wangyizhi1

inference_zero_shot多次推理时,概率出现音频拉长问题,这个prompt生成音频一般是3s,有时候会出现6s的,声音比较奇怪。

使用的官方自带的音色,A800推理

正常.wav
异常.wav

import sys
sys.path.append('third_party/Matcha-TTS')
from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio

cosyvoice = CosyVoice2('./CosyVoice2-0.5B', load_jit=False, load_trt=False, load_vllm=False, fp16=False)

prompt_speech_16k = load_wav('./asset/zero_shot_prompt.wav', 16000)
prompts = [
    "Hello! My name is your name.",
    "Hello! My name is your name.",
    "Hello! My name is your name.",
    "Hello! My name is your name.",
    "Hello! My name is your name.",
    "Hello! My name is your name.",
    "Hello! My name is your name.",
    "Hello! My name is your name.",
    "Hello! My name is your name.",
    "Hello! My name is your name.",
    "Hello! My name is your name.",
]

for n in range(len(prompts)):
    for i, j in enumerate(cosyvoice.inference_zero_shot(prompts[n], '希望你以后能够做的比我还好哟', prompt_speech_16k, stream=False)):
        torchaudio.save('model_zero_shot_{}{}.wav'.format(n,i), j['tts_speech'], cosyvoice.sample_rate)
        print(f"save model_zero_shot_{n}{i}.wav")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions