-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MLLM text encoder #93
Comments
Seconding this, seems to be one of the most important tools to get the most out of the model. |
MLLM is based on causal attention while T5-XXL utilizes bidirectional attention |
Please refer to this https://github.com/Tencent/HunyuanVideo/blob/main/ckpts/README.md |
I read the document, but it doesn't mention a release date for HunyuanMLLM or include examples of how the videos were captioned. It just states "At this stage, we have not yet released HunyuanMLLM". |
Hi, this model is really incredible, but it is really hard to control the outputs. When will you release the MLLM Text Encoder? It seems necessary to control the model's outputs. With Llama, it is really hard, if not impossible, to get videos as good as those published on the project page. It would also be good to have some examples of how the videos in the dataset were captioned to get an idea of how to format the prompts.
The text was updated successfully, but these errors were encountered: