[debug] support flow cache, for sharper tts_mel output #412

boji123 · 2024-09-20T04:38:35Z

我是柏基
#379 问题2的解决方案

flowmatching中的z和mu，跨chunk时对于每个index不是定值，是导致衔接处频谱模糊的因素之一（本质是flow的attention context问题，无解）

图中是flow的tts_mel输出，用于对比上下文及频谱模糊的问题
大图1列：不带cache；2列：带cache
小图左：前chunk最后34；中：（前+后）/2；右：后chunk开头34
可以发现带cache的，tts_mel频谱更清晰

*由于后续的mel fade、hifigan cache、speech fade的挽救，该项虽然更本质，但最终听感提升概率较小，多测测的确是有badcase得到改善的

boji123 · 2024-09-24T09:44:14Z

补充：可以缓解流式推理上下文音量突变问题（带cache音量参考）

aluminumbox · 2024-09-29T06:40:04Z

transformer的真流式需要做causal推理，虽然overlap出的结果保留了下来，但是flow matching的decoder在每个chunk的diffusion时的context已经发生了变化，导致最终生成的mel与上一次的overlap处的mel还是衔接不上。我们已经在做真流式的模型训练，这个pr先关闭了。

boji123 · 2024-09-29T07:35:10Z

flow matching 的输入 Z和MU使用cache结果是有意义的；同时你们做causal同样需要使用此cache（flowmatching 随机性）

[debug] support flow cache, for sharper tts_mel output

283e612

boji123 mentioned this pull request Sep 20, 2024

【bug】temperature的随机性导致流式推理前后窗的mel谱不连续 #406

Closed

aluminumbox closed this Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[debug] support flow cache, for sharper tts_mel output #412

[debug] support flow cache, for sharper tts_mel output #412

boji123 commented Sep 20, 2024

boji123 commented Sep 24, 2024

aluminumbox commented Sep 29, 2024

boji123 commented Sep 29, 2024

[debug] support flow cache, for sharper tts_mel output #412

[debug] support flow cache, for sharper tts_mel output #412

Conversation

boji123 commented Sep 20, 2024

boji123 commented Sep 24, 2024

aluminumbox commented Sep 29, 2024

boji123 commented Sep 29, 2024