video sparse attention #190

akaitsuki-ii · 2025-10-27T13:20:14Z

支持video sparse attention，包括单卡和多卡实现；
初始化时模型不再设置attn_kwargs，改成forward时传递 (为了兼容vsa的运行时参数)；
目前只兼容了标准的diffusers checkpoint，不能直接兼容FastVideo的蒸馏checkpoint (因为有部分参数冗余，不打算兼容，另外scheduler参数也需要打包到metadata里面)，后面预计产出的checkpoint会使用diffsynth或diffusers格式

gemini-code-assist · 2025-10-27T13:20:55Z

Summary of Changes

Hello @akaitsuki-ii, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the DiffSynth engine by integrating Video Sparse Attention (VSA), a specialized attention mechanism tailored for video processing. The core changes involve adding VSA as a new, configurable attention implementation, transitioning attention parameter handling to a dynamic, runtime approach, and improving checkpoint compatibility for WAN DiT models with the Diffusers format. These enhancements aim to boost the performance and adaptability of video generation workflows, especially in distributed computing environments.

Highlights

Video Sparse Attention (VSA) Integration: Introduced support for Video Sparse Attention (VSA), including both single-card and distributed (multi-card) implementations, to enhance efficiency in video generation models.
Dynamic Attention Parameter Configuration: Refactored the attention mechanism across various models (Flux, Qwen-Image, WAN DiT) to pass attention-related keyword arguments (attn_kwargs) during the forward pass instead of during model initialization. This allows for more flexible, runtime-configurable attention parameters, crucial for VSA.
Diffusers Checkpoint Compatibility for WAN DiT: Added a new keymap and conversion logic for WAN DiT models to ensure compatibility with standard Diffusers checkpoints. This facilitates easier integration and use of pre-trained models from the Diffusers ecosystem.
Enhanced Distributed Parallelism: Extended the parallel processing utilities to include new Ulysses and Ring groups, specifically designed to support distributed Video Sparse Attention, optimizing performance for multi-GPU setups.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

此 PR 引入了 video sparse attention (VSA) 并重构了注意力参数的处理方式。整体变更结构清晰。对 VSA 的支持（包括单卡和多卡实现）是一项很好的补充。将 attn_kwargs 从模型初始化重构到前向传递中，是一个优秀的设计选择，提高了灵活性，并且在整个代码库中得到了一致的应用。为 wan_dit 模型增加对 diffusers checkpoint 的支持也非常有用。我有一些改进建议，主要关于新 VSA 实现中的代码重复问题。

diffsynth_engine/models/basic/video_sparse_attention.py

diffsynth_engine/models/wan/wan_dit.py

Glaceon-Hyy · 2025-10-28T06:52:41Z

diffsynth_engine/configs/pipeline.py

这里AttentionConfig SPARGE/VSA已经分叉了，可以定义两个子类，把prepare_attn_kwargs定义在子类的to_attn_kwargs方法里

加了attn_params字段来适配不同的参数

video sparse attention

ebf4f44

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

Glaceon-Hyy reviewed Oct 28, 2025

View reviewed changes

akaitsuki-ii added 2 commits November 5, 2025 11:24

use config.get_attn_kwargs

f8a95b4

apply suggestions

b4d21ba

Glaceon-Hyy approved these changes Nov 5, 2025

View reviewed changes

Glaceon-Hyy merged commit 4ae8f2c into main Nov 5, 2025

Glaceon-Hyy deleted the vsa branch November 5, 2025 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

video sparse attention #190

video sparse attention #190

Uh oh!

akaitsuki-ii commented Oct 27, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Glaceon-Hyy Oct 28, 2025

Uh oh!

akaitsuki-ii Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

video sparse attention #190

video sparse attention #190

Uh oh!

Conversation

akaitsuki-ii commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Glaceon-Hyy Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

akaitsuki-ii Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akaitsuki-ii commented Oct 27, 2025 •

edited

Loading