LTX-2图生视频，为什么生成的视频会出现前两帧不动的情况，这和什么参数有关，该怎么解决此问题。

这是我的脚本：
from ltx_pipelines.ti2vid_two_stages import TI2VidTwoStagesPipeline
from ltx_core.loader import LoraPathStrengthAndSDOps
from ltx_core.loader.sd_ops import LTXV_LORA_COMFY_RENAMING_MAP
from ltx_core.model.video_vae import TilingConfig, get_video_chunks_number
from ltx_pipelines.utils.constants import DEFAULT_NEGATIVE_PROMPT, AUDIO_SAMPLE_RATE
from ltx_pipelines.utils.media_io import encode_video
import torch
import time

pipeline = TI2VidTwoStagesPipeline(
    checkpoint_path = "/data3/LTX-2/ltx-2-19b-dev-fp8.safetensors",
    spatial_upsampler_path = "/data3/LTX-2/ltx-2-spatial-upscaler-x2-1.0.safetensors",
    gemma_root = "/data3/LTX-2/gemma",
    distilled_lora = [
        LoraPathStrengthAndSDOps(
            path = "/data3/LTX-2/ltx-2-19b-distilled-lora-384.safetensors",
            strength = 0.8,
            sd_ops = LTXV_LORA_COMFY_RENAMING_MAP)
    ],
    loras = [],
    fp8transformer = True
)
tiling_config = TilingConfig.default()

duration = 12
height, width = 1280,704
frame_rate = 25
num_frames = duration * frame_rate + 1
num_inference_steps =20
cfg_guidance_scale = 3.5
images = [("/data3/LTX-2/inputs/11.png", 0, 0.8)]
output_path = "/data3/LTX-2/outputkkk/8888.mp4"
prompt = """
画面一开始就动起来，湍急水流裹挟枯叶在河面飞旋，参考图像中的男子在漩涡深绿的河水中踉跄站稳，黑色西装被激流剧烈拍打，泛起深色水雾；数寸之外，溺亡鬼魂顺着暗流缓缓飘来，湿漉漉的缠结长发随水流飘动，苍白腐烂的面容破水而出，咧开宽阔而不整的笑容。体积光月光穿透骷髅般的枝桠，在流动的水面上投下冷蓝色闪烁轮廓光。镜头缓慢推轨至中景，溺亡鬼魂歪着头，神情顽皮而阴森，开口道："它摸起来润润的，快说下它叫啥嘛～"。参考图像中的男子低头凝视，下颌紧绷，回应："说下嘛~"。随着镜头推进，背景森林渐隐为暗色朦胧的散景，鬼魂瞳孔中映射出月光的闪烁。切至溺亡鬼魂脸部的特写；湿润皮肤在苍白光线下泛着幽光，它身子前倾，低语道："里头就是我要的那个字呀，说下嘛～"。参考图像中的男子在画面左缘若隐若现，保持沉默，神情坚毅。一缕河水从鬼魂下巴滴落，激起细碎水花，折射出点点微光。镜头缓慢向右摇摄，聚焦于男子面容，他直视鬼魂，重复："说下嘛~"。鬼魂的面孔滑出画框，取而代之的是男子身后那片漆黑的湍流。冷蓝色光线扫过他额际，突显眉宇间的紧绷。
外景。神秘河岸 – 黄昏 – 诡异薄雾氛围。高速摄影，动态水流，手持镜头晃动。
摄影机：Sony Venice
镜头：ARRI Signature Prime
焦距：35mm
光圈：f/1.4
"""
negative_prompt="face distortion, inconsistent facial features, morphing face, blurry face,subtitle, caption, text, watermark, logo, timestamp, OSD, UI, scoreboard, channel icon, credits, opening titles, closing titles, lower third, on-screen text, any text, blurry white strip, static overlay, silent movie text cards, subtitle bar, karaoke lyrics, forced narrative, forced subtitles, burned-in text, hardsubs, softsubs, text box, speech bubble, scrolling text, ticker, copyright notice, disclaimer text, rating badge, TV-MA bug, Dolby logo, aspect ratio tag, metadata overlay, digital on-screen graphics, unexpected characters, glyphs, letters, numbers, symbols, three hands, extra hand, third hand, 3 hands, multiple hands, identity inconsistency, frame-by-frame facial drift, static image, blurry, out of focus, overexposed, underexposed, low contrast, washed out colors, excessive noise, grainy texture, poor lighting, flickering, motion blur, distorted proportions, unnatural skin tones, deformed facial features, asymmetrical face, missing facial features, extra limbs, disfigured hands, wrong hand count, artifacts around text, inconsistent perspective, camera shake, incorrect depth of field, background too sharp, background clutter, distracting reflections, harsh shadows, inconsistent lighting direction, color banding, cartoonish rendering, 3D CGI look, unrealistic materials, uncanny valley effect, incorrect ethnicity, wrong gender, exaggerated expressions, wrong gaze direction, mismatched lip sync, silent or muted audio, distorted voice, robotic voice, echo, background noise, off-sync audio, incorrect dialogue, added dialogue, repetitive speech, jittery movement, awkward pauses, incorrect timing, unnatural transitions, inconsistent framing, tilted camera, flat lighting, inconsistent tone, cinematic oversaturation, stylized filters, AI artifacts"

video_chunks_number = get_video_chunks_number(num_frames, tiling_config)

with torch.inference_mode():
    st = time.time()
    video, audio = pipeline(
            prompt=prompt,
            negative_prompt=DEFAULT_NEGATIVE_PROMPT,
            seed=521,
            height=height,
            width=width,
            num_frames=num_frames,
            frame_rate=frame_rate,
            num_inference_steps=num_inference_steps,
            cfg_guidance_scale=cfg_guidance_scale,
            images=images,
            tiling_config=tiling_config,
        )
    encode_video(
        video=video,
        fps=frame_rate,
        audio=audio,
        audio_sample_rate=AUDIO_SAMPLE_RATE,
        output_path=output_path,
        video_chunks_number=video_chunks_number,
    )
    print(f"time cost: {time.time() - st}")
这是图片
![Image](https://github.com/user-attachments/assets/6205e23c-c509-44a1-b32f-b20fbbce77ae)
这是生成的视频：
https://github.com/user-attachments/assets/8bc60383-9f13-43a2-9d93-592e73251e6a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LTX-2图生视频，为什么生成的视频会出现前两帧不动的情况，这和什么参数有关，该怎么解决此问题。 #100

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LTX-2图生视频，为什么生成的视频会出现前两帧不动的情况，这和什么参数有关，该怎么解决此问题。 #100

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions