-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Description
inference_zero_shot多次推理时,概率出现音频拉长问题,这个prompt生成音频一般是3s,有时候会出现6s的,声音比较奇怪。
使用的官方自带的音色,A800推理
import sys
sys.path.append('third_party/Matcha-TTS')
from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio
cosyvoice = CosyVoice2('./CosyVoice2-0.5B', load_jit=False, load_trt=False, load_vllm=False, fp16=False)
prompt_speech_16k = load_wav('./asset/zero_shot_prompt.wav', 16000)
prompts = [
"Hello! My name is your name.",
"Hello! My name is your name.",
"Hello! My name is your name.",
"Hello! My name is your name.",
"Hello! My name is your name.",
"Hello! My name is your name.",
"Hello! My name is your name.",
"Hello! My name is your name.",
"Hello! My name is your name.",
"Hello! My name is your name.",
"Hello! My name is your name.",
]
for n in range(len(prompts)):
for i, j in enumerate(cosyvoice.inference_zero_shot(prompts[n], '希望你以后能够做的比我还好哟', prompt_speech_16k, stream=False)):
torchaudio.save('model_zero_shot_{}{}.wav'.format(n,i), j['tts_speech'], cosyvoice.sample_rate)
print(f"save model_zero_shot_{n}{i}.wav")
Metadata
Metadata
Assignees
Labels
No labels