Skip to content

Conversation

@akaitsuki-ii
Copy link
Contributor

@akaitsuki-ii akaitsuki-ii commented Oct 27, 2025

  1. 支持video sparse attention,包括单卡和多卡实现;
  2. 初始化时模型不再设置attn_kwargs,改成forward时传递 (为了兼容vsa的运行时参数);
  3. 目前只兼容了标准的diffusers checkpoint,不能直接兼容FastVideo的蒸馏checkpoint (因为有部分参数冗余,不打算兼容,另外scheduler参数也需要打包到metadata里面),后面预计产出的checkpoint会使用diffsynth或diffusers格式

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @akaitsuki-ii, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the DiffSynth engine by integrating Video Sparse Attention (VSA), a specialized attention mechanism tailored for video processing. The core changes involve adding VSA as a new, configurable attention implementation, transitioning attention parameter handling to a dynamic, runtime approach, and improving checkpoint compatibility for WAN DiT models with the Diffusers format. These enhancements aim to boost the performance and adaptability of video generation workflows, especially in distributed computing environments.

Highlights

  • Video Sparse Attention (VSA) Integration: Introduced support for Video Sparse Attention (VSA), including both single-card and distributed (multi-card) implementations, to enhance efficiency in video generation models.
  • Dynamic Attention Parameter Configuration: Refactored the attention mechanism across various models (Flux, Qwen-Image, WAN DiT) to pass attention-related keyword arguments (attn_kwargs) during the forward pass instead of during model initialization. This allows for more flexible, runtime-configurable attention parameters, crucial for VSA.
  • Diffusers Checkpoint Compatibility for WAN DiT: Added a new keymap and conversion logic for WAN DiT models to ensure compatibility with standard Diffusers checkpoints. This facilitates easier integration and use of pre-trained models from the Diffusers ecosystem.
  • Enhanced Distributed Parallelism: Extended the parallel processing utilities to include new Ulysses and Ring groups, specifically designed to support distributed Video Sparse Attention, optimizing performance for multi-GPU setups.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

此 PR 引入了 video sparse attention (VSA) 并重构了注意力参数的处理方式。整体变更结构清晰。对 VSA 的支持(包括单卡和多卡实现)是一项很好的补充。将 attn_kwargs 从模型初始化重构到前向传递中,是一个优秀的设计选择,提高了灵活性,并且在整个代码库中得到了一致的应用。为 wan_dit 模型增加对 diffusers checkpoint 的支持也非常有用。我有一些改进建议,主要关于新 VSA 实现中的代码重复问题。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里AttentionConfig SPARGE/VSA已经分叉了,可以定义两个子类,把prepare_attn_kwargs定义在子类的to_attn_kwargs方法里

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加了attn_params字段来适配不同的参数

@Glaceon-Hyy Glaceon-Hyy merged commit 4ae8f2c into main Nov 5, 2025
@Glaceon-Hyy Glaceon-Hyy deleted the vsa branch November 5, 2025 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants