Skip to content

Feat/video models#4970

Merged
vladmandic merged 3 commits into
devfrom
feat/video-models
Jul 1, 2026
Merged

Feat/video models#4970
vladmandic merged 3 commits into
devfrom
feat/video-models

Conversation

@CalamitousFelicitousness

Copy link
Copy Markdown
Collaborator

Description

Adds first-last-frame (FLF2V) video for Wan 2.2 I2V and LTX, and allows the Wan 2.2 A14B mixture-of-experts boundary to be tuned at runtime (in both the video tab and the base-model image path) rather than only at model load.

Notes

  • Boundary: The slider is applied at generation time in set_pipeline_args, the single hook both image and video generation pass through before invoking the pipeline, so changing it takes effect on the next generation with no model reload. Also adds -1 as the default value which uses the value in the model's config. The block is gated to a no-op unless the model has both experts resident and a boundary_ratio in config (mirrors the SDXL register_to_config already in that function).
  • Wan FLF: the I2V path forwards last_image when the pipeline accepts it and isn't expand_timesteps (the 5B masks to the first frame), otherwise warns. Uses the same base I2V-A14B weights, matching ComfyUI's FLF workflow.
  • LTX FLF: the last-frame image is conditioned at the final frame index (index=-1 for 2.x, num_frames-1 for 0.9) instead of frame 0; the Last image input shows only for Condition pipelines, I'll probably collapse it into I2V, thinking back it wasn't really worth splitting.

Environment and Testing

  • Linux (WSL2), Python 3.13, CUDA, RTX 3090

…ideo paths

Wan 2.2 A14B ships a per-model boundary_ratio (0.9 I2V, 0.875 T2V) that selects the high- or low-noise expert per step. The video and base-model image loaders both load the shipped value; the slider override is applied at generation time in set_pipeline_args, the one point both paths pass through before invoking the pipeline.

The denoising loop reads config.boundary_ratio each call, so tuning takes effect with no reload for video and base-model images alike. The slider defaults to -1, meaning use the model's value; 0 to 1 set the boundary explicitly. Single-expert stages stay load-time because they drop a transformer to free VRAM.
The I2V path forwards a last-frame image to the pipeline when one is supplied and the loaded pipeline accepts it, turning the run into first-last-frame interpolation. supports_last_frame() gates on the pipeline taking a last_image argument and not running expand_timesteps, which conditions on the first frame only, so a model that cannot use a last frame logs a warning instead of silently ignoring it.
The LTX tab already collected a last-frame image but anchored every condition at index 0, so it never acted as a last frame. Build a separate condition for it at the final frame: index -1 for the 2.x family (latent index, negatives wrap) and num_frames-1 for 0.9 (pixel index). The Last image input now shows only for Condition models, the pipelines that accept multi-frame conditioning.
@vladmandic vladmandic merged commit b3cf481 into dev Jul 1, 2026
2 checks passed
@vladmandic vladmandic deleted the feat/video-models branch July 1, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants