Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuyanVideo version for RTX 3090 / RTX 4090 just released #109

Open
deepbeepmeep opened this issue Dec 10, 2024 · 21 comments
Open

HuyanVideo version for RTX 3090 / RTX 4090 just released #109

deepbeepmeep opened this issue Dec 10, 2024 · 21 comments

Comments

@deepbeepmeep
Copy link

Thanks for the the best open source video generator !!!

In have a created a fork of this repository and adapted it so that this Hunyuan Video can run even on consumer GPUs.

https://github.com/deepbeepmeep/HunyuanVideoGP

It is pretty fast for a consumer GPU as you can generate 97 frames (more than 3s) frames at 848x480 in less than 12 minutes.

@breadbrowser
Copy link

as it needs an 80gb gpu it seems unlikely to fit on a 24gb gpu unless it is at 4bit

@deepbeepmeep
Copy link
Author

That's incorrect, the main video model itself is around 24GB and can be reduced to 12GB with 8 bits quantization for a cost of a minimal degradation . The text encoder (T5 XXL) is 40 GB if you use the 32 bits version with the decoder that is not needed, but if you keep only the encoder in 16 bits format (no degradation at all), it takes only 10 GB.
Then you just need to offload unused models to the CPU as only one is needed at a given time and voila !
Even if the quality is not a good as the full 16 bits version, the end result is far ahead anything that is opensource (and that can run on a RTX 4090).

@FurkanGozukara
Copy link

@deepbeepmeep thanks but i don2t see you made any significant changes to this base repo?

by the way T5 can be loaded as FP8 and there is even better scaled version did you try them?

@yuanbopeng
Copy link

@deepbeepmeep thanks but i don2t see you made any significant changes to this base repo?

+1

@tavyscrolls
Copy link

Key changes would be in gradio_server.py the rest is getting rid of references to the original model. Beyond that, I'm a little confused but I think the point is to download the quantized models manually and put them in the ckpt folder.

Which makes me wonder, can you still use xDiT with this?

@deepbeepmeep
Copy link
Author

No change is in the base model because the magic is in my offload library mmgp called from gradio_server.py. The same library can be used to offload Flux, Cogview, Mochi models, …I will provide soon more sample apps. Be aware there right now you need 64 GB of RAM. There is little gain to prequantify the models as it is done on the fly and is pretty fast.

@FurkanGozukara
Copy link

@deepbeepmeep can you give link about mmgp library? it sounds nice

@Dhilu16
Copy link

Dhilu16 commented Dec 11, 2024

@deepbeepmeep i saw you just update the code for low ram, can you tell me will it work with 32 gb + 3090?
Also if possible please update the instructions also.
Thanks, you did a great job man!

@deepbeepmeep
Copy link
Author

@Dhilu16 The RAM is used to store the models while they are not in the GPU, I am afraid I don't think 32 GB will be sufficient. You can try anyway. Maybe one solution would be te quantize as well the text encoder to reduce the RAM consumption. Maybe if I have some time this week, I will add an option.

@deepbeepmeep
Copy link
Author

@FurkanGozukara

you can install the module using "pip install mmgp". You will find instructions on how to use the module below: https://github.com/deepbeepmeep/mmgp

If you are interested I have also applied my module to Flux Fill (very good iterative inpainting / outpainting tool):
https://github.com/deepbeepmeep/FluxFillGP

@deepbeepmeep
Copy link
Author

@deepbeepmeep thanks but i don2t see you made any significant changes to this base repo?

by the way T5 can be loaded as FP8 and there is even better scaled version did you try them?

I got mixed up with Flux, in fact Hunyan Video uses a Llama based text encoder. It could be indeed quantized to reduce the memory its foot print. I will add this capability when I have time.

@deepbeepmeep
Copy link
Author

@FurkanGozukara and @Dhilu16

I have just published mmgp 1.2.0 that now accepts an extra parameter (modelsToQuantize) that contains a list of additional models to quantize.

So here you can try "offload.all(pipe, modelsToQuantize= ["text_encoder"])" to quantize both the video model (quantized by default) and the Llama text_encoder.

Please let me know if it helps.

@FurkanGozukara
Copy link

@deepbeepmeep awesome work

so i hope you add those features to your gradio that is what i am planning to test

@deepbeepmeep
Copy link
Author

@deepbeepmeep awesome work

so i hope you add those features to your gradio that is what i am planning to test

It is right now in the latest version of my fork. You may comment / un comment lines 34-36 to try the different options.

@tavyra
Copy link

tavyra commented Dec 12, 2024

Curious if there's an easy way to keep it from loading a model for each GPU into RAM when adapting the gradio.py code to sample.py? I'm loading the text encoder from a pre-quantized model and I'm not even sure 96gb would be enough RAM.

@rzgarespo
Copy link

installed it on Win11 nvidia RTX3080 (10GB vram) and used 42GB DDR4 ram (out of 96GB). it took about 3H to generate 49 frames, 25 stgeps
youtube.com/watch?v=ylfeJ7Cv8AE

@randaller
Copy link

randaller commented Dec 15, 2024

Windows 2022, RTX3090 - 25 minutes to 848 x 480 x 97 frames, 50 steps

@krishnapraveen7
Copy link

God bless you.

@deepbeepmeep
Copy link
Author

installed it on Win11 nvidia RTX3080 (10GB vram) and used 42GB DDR4 ram (out of 96GB). it took about 3H to generate 49 frames, 25 stgeps youtube.com/watch?v=ylfeJ7Cv8AE

Unfortunately 10 GB of VRAM is insufficient right now. I am working on an improved version that will use sequence offloading to reduce even more the VRAM requirements but I am afraid that 10GB of VRAM won't be enough anyway.
With this new version I can get up to 8s (180 frames) in 40 minutes. on my RTX 4090
I will update this thread when the new version is available.

@mytait
Copy link

mytait commented Dec 18, 2024

so we load the normal original weights and your code?
by the way your code only ads your module to the pipeline?

am i right that the only change to the original code is:

from mmgp import offload 
pipe = hunyuan_video_sampler.pipeline
offload.all(pipe) 

since you only updated the gradio demo.
Could you also update the normal python inference sample_video.py?

@deepbeepmeep
Copy link
Author

so we load the normal original weights and your code? by the way your code only ads your module to the pipeline?

am i right that the only change to the original code is:

from mmgp import offload 
pipe = hunyuan_video_sampler.pipeline
offload.all(pipe) 

since you only updated the gradio demo. Could you also update the normal python inference sample_video.py?

That's right it is plug and play on the existing pipeline. In fact my code works as well on Flux, CogVideo, Mochi, ... with minimal change to the core code. I will try release a new version by the end of the week that is faster and requires less memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests