Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: can use other size #26

Open
steven850 opened this issue Jun 5, 2024 · 8 comments
Open

fix: can use other size #26

steven850 opened this issue Jun 5, 2024 · 8 comments

Comments

@steven850
Copy link

You added this fix to be able to use other sizes, what sizes will the system accept?
I cant seem to go above 640, anything above 640 gives this error
768 give this
AssertionError: There are 0 faces in the 0-th frame. Only one face is supported."

@tiankuan93
Copy link
Collaborator

tiankuan93 commented Jun 5, 2024

You can try the following script. I just provided a simple example, you'd better find some clearer picture and video yourself.

python scripts/extract_kps_sequence_and_audio.py \
    --video_path "./test_samples/short_case/AOC/gt.mp4" \
    --kps_sequence_save_path "./test_samples/short_case/AOC/kps_768.pth" \
    --audio_save_path "./test_samples/short_case/AOC/aud.mp3" \
    --height 768 \
    --width 768

python inference.py \
    --reference_image_path "./test_samples/short_case/AOC/ref_768.png" \
    --audio_path "./test_samples/short_case/AOC/aud.mp3" \
    --kps_path "./test_samples/short_case/AOC/kps_768.pth" \
    --output_path "./output/short_case/talk_AOC_no_retarget_768.mp4" \
    --retarget_strategy "no_retarget" \
    --num_inference_steps 20 \
    --image_width 768 \
    --image_height 768

ref_768.png

ref_768

@steven850
Copy link
Author

steven850 commented Jun 5, 2024

That is what I ran, and that is what gives me the error if I go past 640 on the resolution.
Traceback (most recent call last):
File "Z:\vex\scripts\extract_kps_sequence_and_audio.py", line 38, in
assert len(faces) == 1, f'There are {len(faces)} faces in the {frame_idx}-th frame. Only one face is supported.'
AssertionError: There are 0 faces in the 0-th frame. Only one face is supported.

@FurkanGozukara
Copy link

i generated video even with 768x768 worked for me

but video has only 1 face

#27

@steven850
Copy link
Author

steven850 commented Jun 6, 2024

I have tried hundreds of combinations now, 640 is the max it will do, anything above that it wont detect faces anymore. Im using the included video, short_case/TYS. Single face. I made some upscaled versions of the video, 1024, 1000, 768, 640 etc.

"python scripts/extract_kps_sequence_and_audio.py --video_path "./test_samples/short_case/tys/gt768.mp4" --kps_sequence_save_path "./test_samples/short_case/tys/kps768.pth" --audio_save_path "./test_samples/short_case/tys/gt.mp3" --height 768 --width 768"

Also tried selecting some of the other video sizes like 512 while specifying 768 to see what would happen. Same error.

If I select the 1024 video, but don't call for a higher resolution it works. so the problem is explicitly tied to calling for a specific resolution.

@FurkanGozukara
Copy link

here 768x768 i made not upscaled

Biden_Photo_Big_result_0003.mp4

@KMiNT21
Copy link

KMiNT21 commented Jun 11, 2024

You added this fix to be able to use other sizes, what sizes will the system accept? I cant seem to go above 640, anything above 640 gives this error 768 give this AssertionError: There are 0 faces in the 0-th frame. Only one face is supported."

The same issue happened to me. I just debugged this for hours. :)
What I found is that when we use a 1024x1024 image inside retinaface.py, the face detection model gets BAD RESULTS, with confidence less than the threshold of 0.5. So, some inputs can work, but some do not.

Why? I haven't had time to figure it out yet. But it looks like the problem is with incorrectly using the model_ckpts\insightface_models\models\buffalo_l\det_10g.onnx model, which is 640x640. And if we pass an image size larger than that, some incorrect resizing happens.

@FurkanGozukara
Copy link

You added this fix to be able to use other sizes, what sizes will the system accept? I cant seem to go above 640, anything above 640 gives this error 768 give this AssertionError: There are 0 faces in the 0-th frame. Only one face is supported."

The same issue happened to me. I just debugged this for hours. :) What I found is that when we use a 1024x1024 image inside retinaface.py, the face detection model gets BAD RESULTS, with confidence less than the threshold of 0.5. So, some inputs can work, but some do not.

Why? I haven't had time to figure it out yet. But it looks like the problem is with incorrectly using the model_ckpts\insightface_models\models\buffalo_l\det_10g.onnx model, which is 640x640. And if we pass an image size larger than that, some incorrect resizing happens.

ah makes sense

my input video was also 512x512

@KMiNT21
Copy link

KMiNT21 commented Jun 12, 2024

So....

If we don't want to face "zero face" problems, there's an easy fix:

Change app.prepare(ctx_id=0, det_size=(args.image_height, args.image_width)) to app.prepare(ctx_id=0, det_size=(min(args.image_height, 640), min(args.image_width, 640))) (and make the same change inside extract_kps...py).

However, there's not much use for it. Image sizes larger than 768 just produce garbage results. As I can see from the V-Express paper at https://arxiv.org/pdf/2406.02511, the model was trained on 512x512 resolution. And, as far as I can understand, it uses the Stable Diffusion 1.5 model, which is also 512x512 (or 768?).

BUT! We have an extremely time-consuming way to increase quality by processing Video to Video with SUPIR-V0Q. It looks interesting.

And... there might be another way to test – use SUPIR for every Nth frame (dropping some frames) and perform frame interpolation. This way, the animation could be more accurate. But it's just an idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants