-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: can use other size #26
Comments
You can try the following script. I just provided a simple example, you'd better find some clearer picture and video yourself. python scripts/extract_kps_sequence_and_audio.py \
--video_path "./test_samples/short_case/AOC/gt.mp4" \
--kps_sequence_save_path "./test_samples/short_case/AOC/kps_768.pth" \
--audio_save_path "./test_samples/short_case/AOC/aud.mp3" \
--height 768 \
--width 768
python inference.py \
--reference_image_path "./test_samples/short_case/AOC/ref_768.png" \
--audio_path "./test_samples/short_case/AOC/aud.mp3" \
--kps_path "./test_samples/short_case/AOC/kps_768.pth" \
--output_path "./output/short_case/talk_AOC_no_retarget_768.mp4" \
--retarget_strategy "no_retarget" \
--num_inference_steps 20 \
--image_width 768 \
--image_height 768 ref_768.png |
That is what I ran, and that is what gives me the error if I go past 640 on the resolution. |
i generated video even with 768x768 worked for me but video has only 1 face |
I have tried hundreds of combinations now, 640 is the max it will do, anything above that it wont detect faces anymore. Im using the included video, short_case/TYS. Single face. I made some upscaled versions of the video, 1024, 1000, 768, 640 etc. "python scripts/extract_kps_sequence_and_audio.py --video_path "./test_samples/short_case/tys/gt768.mp4" --kps_sequence_save_path "./test_samples/short_case/tys/kps768.pth" --audio_save_path "./test_samples/short_case/tys/gt.mp3" --height 768 --width 768" Also tried selecting some of the other video sizes like 512 while specifying 768 to see what would happen. Same error. If I select the 1024 video, but don't call for a higher resolution it works. so the problem is explicitly tied to calling for a specific resolution. |
here 768x768 i made not upscaled Biden_Photo_Big_result_0003.mp4 |
The same issue happened to me. I just debugged this for hours. :) Why? I haven't had time to figure it out yet. But it looks like the problem is with incorrectly using the model_ckpts\insightface_models\models\buffalo_l\det_10g.onnx model, which is 640x640. And if we pass an image size larger than that, some incorrect resizing happens. |
ah makes sense my input video was also 512x512 |
So.... If we don't want to face "zero face" problems, there's an easy fix: Change However, there's not much use for it. Image sizes larger than 768 just produce garbage results. As I can see from the V-Express paper at https://arxiv.org/pdf/2406.02511, the model was trained on 512x512 resolution. And, as far as I can understand, it uses the Stable Diffusion 1.5 model, which is also 512x512 (or 768?). BUT! We have an extremely time-consuming way to increase quality by processing Video to Video with SUPIR-V0Q. It looks interesting. And... there might be another way to test – use SUPIR for every Nth frame (dropping some frames) and perform frame interpolation. This way, the animation could be more accurate. But it's just an idea. |
You added this fix to be able to use other sizes, what sizes will the system accept?
I cant seem to go above 640, anything above 640 gives this error
768 give this
AssertionError: There are 0 faces in the 0-th frame. Only one face is supported."
The text was updated successfully, but these errors were encountered: