Skip to content

Conversation

@CUHKSZzxy
Copy link
Collaborator

@CUHKSZzxy CUHKSZzxy commented Oct 31, 2025

Related

TODO

  • Qwen3-VL-MOE
  • Add documents
  • Video input support ?

@lvhan028 lvhan028 added the enhancement New feature or request label Nov 1, 2025
@CUHKSZzxy
Copy link
Collaborator Author

CUHKSZzxy commented Nov 4, 2025

Improved the config check part, tested with internvl / intern-s1 / qwen3vl / qwen3 / qwen2.5vl / glm4.1v, seems good.

lvhan028
lvhan028 previously approved these changes Nov 4, 2025
@lvhan028 lvhan028 requested a review from grimoire November 4, 2025 12:38
@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 4, 2025

May share the evalution test results

language_hf_config = config.hf_config

# for multi-modal models, get the language model config to determine dtype
if hasattr(config.hf_config, 'text_config'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have llm_config field in config.

visual_embeds: torch.Tensor):
visual_pos_masks = visual_pos_masks.to(hidden_states.device)
visual_embeds = visual_embeds.to(hidden_states.device, hidden_states.dtype)
local_this = hidden_states[visual_pos_masks, :].clone() + visual_embeds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hidden_states[visual_pos_masks, :] would synchronize cuda device. Try:

local = torch.zeros_like(hidden_states)
local.masked_scatter_(visual_pos_masks, visual_embeds)
hidden_states += local

@lvhan028 lvhan028 dismissed their stale review November 5, 2025 06:17

evalution test failed

@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 5, 2025

LLM evaluation test failed by following #4094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants