-
Notifications
You must be signed in to change notification settings - Fork 28
video sparse attention #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @akaitsuki-ii, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the DiffSynth engine by integrating Video Sparse Attention (VSA), a specialized attention mechanism tailored for video processing. The core changes involve adding VSA as a new, configurable attention implementation, transitioning attention parameter handling to a dynamic, runtime approach, and improving checkpoint compatibility for WAN DiT models with the Diffusers format. These enhancements aim to boost the performance and adaptability of video generation workflows, especially in distributed computing environments. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
此 PR 引入了 video sparse attention (VSA) 并重构了注意力参数的处理方式。整体变更结构清晰。对 VSA 的支持(包括单卡和多卡实现)是一项很好的补充。将 attn_kwargs 从模型初始化重构到前向传递中,是一个优秀的设计选择,提高了灵活性,并且在整个代码库中得到了一致的应用。为 wan_dit 模型增加对 diffusers checkpoint 的支持也非常有用。我有一些改进建议,主要关于新 VSA 实现中的代码重复问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里AttentionConfig SPARGE/VSA已经分叉了,可以定义两个子类,把prepare_attn_kwargs定义在子类的to_attn_kwargs方法里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加了attn_params字段来适配不同的参数
Uh oh!
There was an error while loading. Please reload this page.