You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dec 16, 2025: 🚀 We are excited to announce the release of LongCat-Video-Avatar, a unified model that delivers expressive and highly dynamic audio-driven character animation, supporting native tasks including Audio-Text-to-Video, Audio-Text-Image-to-Video, and Video Continuation with seamless compatibility for both single-stream and multi-stream audio inputs. The release includes our Technical Report, code, model weights, and project page.
✨ Key Features:
Support Multiple Generation Modes: one unified model can be used for audio-text-to-video (AT2V) generation, audio-text-image-to-video (ATI2V) generation, and Video Continuation.
Natural Human Dynamics: The disentangled unconditional guidance is designed to effectively decouple speech signals from motion dynamics for natural behavior.
Avoid Repetitive Content: The reference skip attention is adopted to strategically incorporates reference cues to preserve identity while preventing excessive conditional image leakage
Alleviate Error Accumulation from VAE: Cross-Chunk Latent Stitching is designed to eliminates redundant VAE decode-encode cycles to reduce pixel degradation in long sequences