There is a delay between mouth shape and audio #55

hu1148509978 · 2024-12-07T09:04:29Z

Thank you for publicly showcasing such an outstanding work!
When reasoning about Chinese audio, I encountered the problem of inconsistent mouth shape speed and audio speed:
In the first few seconds, the mouth shape is synchronized, but over time, the delay between the mouth shape and the audio increases, causing the mouth shape to be out of sync.
I use video editing software to drag the audio forward or backward for a few seconds to synchronize my mouth movements, but I really don't want to synchronize it through video editing.
May I ask how to solve this problem? If you could reply, I would greatly appreciate it!

macron2story.mp4

Fictionarry · 2024-12-07T11:32:04Z

Hi, this seems to be a problem with the unsynchronized video fps or the audio sampling rate. The input audio is suggested to have a sampling rate of 16,000 before being processed by deepspeech, wav2vec or HuBERT (recommended for your cross-lingual application), and the generated video is of 25 fps. This may be the problem I guess. If not, you can first check whether the generated video and audio have the same length. By the way, I found the provided video is of 30 fps, which is not 25 as the original. I recommend using ffmpeg to combine the video and audio as follow, which can ensure there is no misalignment occurring during this process.

ffmpeg -i <video_path> -i <audio_path> -q 2 <output_path>

hu1148509978 · 2024-12-07T11:37:51Z

Thank you for your prompt response. I will try again. Thank you again for such an excellent job!

hu1148509978 · 2024-12-07T12:58:10Z

Hi, this seems to be a problem with the unsynchronized video fps or the audio sampling rate. The input audio is suggested to have a sampling rate of 16,000 before being processed by deepspeech, wav2vec or HuBERT (recommended for your cross-lingual application), and the generated video is of 25 fps. This may be the problem I guess. If not, you can first check whether the generated video and audio have the same length. By the way, I found the provided video is of 30 fps, which is not 25 as the original. I recommend using ffmpeg to combine the video and audio as follow, which can ensure there is no misalignment occurring during this process.
ffmpeg -i <video_path> -i <audio_path> -q 2 <output_path>

Excellent job! According to your explanation, I have achieved good results!

Fictionarry · 2024-12-07T14:42:45Z

Excellent job! According to your explanation, I have achieved good results!

Haha, thanks for your feedback :)

Fictionarry mentioned this issue Dec 24, 2024

pytorch_model.bin重复下载问题 #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is a delay between mouth shape and audio #55

There is a delay between mouth shape and audio #55

hu1148509978 commented Dec 7, 2024

Fictionarry commented Dec 7, 2024

hu1148509978 commented Dec 7, 2024

hu1148509978 commented Dec 7, 2024

Fictionarry commented Dec 7, 2024

There is a delay between mouth shape and audio #55

There is a delay between mouth shape and audio #55

Comments

hu1148509978 commented Dec 7, 2024

Fictionarry commented Dec 7, 2024

hu1148509978 commented Dec 7, 2024

hu1148509978 commented Dec 7, 2024

Fictionarry commented Dec 7, 2024