MLLM text encoder #93

Bionagato · 2024-12-09T10:51:08Z

Hi, this model is really incredible, but it is really hard to control the outputs. When will you release the MLLM Text Encoder? It seems necessary to control the model's outputs. With Llama, it is really hard, if not impossible, to get videos as good as those published on the project page. It would also be good to have some examples of how the videos in the dataset were captioned to get an idea of how to format the prompts.

Gobz · 2024-12-10T16:05:55Z

Seconding this, seems to be one of the most important tools to get the most out of the model.

breadbrowser · 2024-12-10T23:59:10Z

the text Text Encoder is t5 flan xxl

win10ogod · 2024-12-11T04:14:04Z

the text Text Encoder is t5 flan xxl

MLLM is based on causal attention while T5-XXL utilizes bidirectional attention
From this we can see that it should not be t5!

kathrinawu · 2024-12-13T11:43:20Z

Please refer to this https://github.com/Tencent/HunyuanVideo/blob/main/ckpts/README.md

Bionagato · 2024-12-13T18:25:15Z

Please refer to this https://github.com/Tencent/HunyuanVideo/blob/main/ckpts/README.md

I read the document, but it doesn't mention a release date for HunyuanMLLM or include examples of how the videos were captioned. It just states "At this stage, we have not yet released HunyuanMLLM".

zhoudaquan closed this as completed Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLLM text encoder #93

MLLM text encoder #93

Bionagato commented Dec 9, 2024

Gobz commented Dec 10, 2024

breadbrowser commented Dec 10, 2024

win10ogod commented Dec 11, 2024

kathrinawu commented Dec 13, 2024

Bionagato commented Dec 13, 2024

MLLM text encoder #93

MLLM text encoder #93

Comments

Bionagato commented Dec 9, 2024

Gobz commented Dec 10, 2024

breadbrowser commented Dec 10, 2024

win10ogod commented Dec 11, 2024

kathrinawu commented Dec 13, 2024

Bionagato commented Dec 13, 2024