-
Notifications
You must be signed in to change notification settings - Fork 25
[fix] Handle router logits in Qwen 3 moe and Qwen 3 omni moe for aux loss #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nd qwen 3 omni moe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| output_router_logits = ( | ||
| output_router_logits | ||
| if output_router_logits is not None | ||
| else getattr(self.config, "output_router_logits", False) | ||
| ) | ||
| all_router_logits = () if output_router_logits else None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default the output router logits is always False?
https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507/blob/main/config.json
Do you think we should change it to True for training, since this patch will only use in training mode.
| output_router_logits = ( | ||
| output_router_logits | ||
| if output_router_logits is not None | ||
| else getattr(self.config, "output_router_logits", False) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem here. Also I checked the config for Qwen3VLMoe. Seems like this problem also exist in qwen3 vl moe? Do you think we need to simply set this to true in training mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will fix this
Motivation
Modifications
Commit Message Convention
Please follow our standardized commit message format:
[feat]- New features or functionality[fix]- Bug fixes[docs]- Documentation changes only[style]- Code style changes (formatting, missing semicolons, etc.)[refactor]- Code refactoring without changing functionality[perf]- Performance improvements[test]- Adding or updating tests[chore]- Maintenance tasks, dependency updates, etc.[ci]- CI/CD configuration changesExamples:
[feat] add qwen omni iterable dataset support[fix] resolve bagel model configuration error[docs] update training guide with YAML examplesSee CONTRIBUTING.md for more details.
CI/CD Checks
Your PR will automatically run the following checks:
black(line-length=120) and import sorting withisortpre-commit run --all-fileslocally to verify before pushingChecklist
pre-commit run --all-filesand ensure all checks passblack(line-length=120) andisort