Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLLM text encoder #93

Closed
Bionagato opened this issue Dec 9, 2024 · 5 comments
Closed

MLLM text encoder #93

Bionagato opened this issue Dec 9, 2024 · 5 comments

Comments

@Bionagato
Copy link

Hi, this model is really incredible, but it is really hard to control the outputs. When will you release the MLLM Text Encoder? It seems necessary to control the model's outputs. With Llama, it is really hard, if not impossible, to get videos as good as those published on the project page. It would also be good to have some examples of how the videos in the dataset were captioned to get an idea of how to format the prompts.

@Gobz
Copy link

Gobz commented Dec 10, 2024

Seconding this, seems to be one of the most important tools to get the most out of the model.

@breadbrowser
Copy link

@win10ogod
Copy link

the text Text Encoder is t5 flan xxl

MLLM is based on causal attention while T5-XXL utilizes bidirectional attention
From this we can see that it should not be t5!

@kathrinawu
Copy link
Collaborator

Please refer to this https://github.com/Tencent/HunyuanVideo/blob/main/ckpts/README.md

@Bionagato
Copy link
Author

Please refer to this https://github.com/Tencent/HunyuanVideo/blob/main/ckpts/README.md

I read the document, but it doesn't mention a release date for HunyuanMLLM or include examples of how the videos were captioned. It just states "At this stage, we have not yet released HunyuanMLLM".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants