论文中Tem-Con和Fram-Acc用的是生成的两个关键帧计算的嘛？Fram-Acc具体是怎么计算的呀？ #142

lankeren035 · 2024-11-17T15:55:12Z

No description provided.

williamyang1991 · 2024-11-18T01:33:11Z

We follow GEN1

Tem-Con We compute CLIP image embeddings on all frames of output videos and report the average cosine similarity between all pairs of consecutive frames.
输出视频的每连续两帧，计算两帧CLIP image embedding之间的cos相似度，所有连续两帧的相似度取均值

We follow FateZero

Frame-Acc is the frame-wise editing accuracy, which is the percentage of frames where the edited image has a higher CLIP similarity to the target prompt (e.g., a beautiful woman in CG style) than the source prompt (e.g., a beautiful woman)
计算输出帧与输入帧的描述的CLIP相似度，计算输出帧与目标描述的CLIP相似度，总计整个视频后者高于前者的占比

lankeren035 · 2024-11-18T04:26:29Z

为什么要用输出视频的所有帧呢，按照文章里K=10，那么计算的指标很大一部分是由Ebsynth模型贡献的啊？

williamyang1991 · 2024-11-18T05:54:33Z

我们实验结果里定量结果，为了公平都是只比较关键帧，没用ebsynth
我们K=5取关键帧，然后所有指标都是在关键帧上计算的

lankeren035 · 2024-11-18T07:32:57Z

sorry，是我的疏忽，感谢您的耐心回答。此外，我对shape fusion和pixel fusion之间的关系有点疑问。shape fusion与pixel fusion之间是互补的嘛？这块对于ablation我有点不理解，原文的ablation应该是一步步加上各个模块的，我觉得为什么不直接全部用pixel fusion呢？shape fusion的必要性我不太理解。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

论文中Tem-Con和Fram-Acc用的是生成的两个关键帧计算的嘛？Fram-Acc具体是怎么计算的呀？ #142

论文中Tem-Con和Fram-Acc用的是生成的两个关键帧计算的嘛？Fram-Acc具体是怎么计算的呀？ #142

lankeren035 commented Nov 17, 2024

williamyang1991 commented Nov 18, 2024 •

edited

Loading

lankeren035 commented Nov 18, 2024

williamyang1991 commented Nov 18, 2024 •

edited

Loading

lankeren035 commented Nov 18, 2024

论文中Tem-Con和Fram-Acc用的是生成的两个关键帧计算的嘛？Fram-Acc具体是怎么计算的呀？ #142

论文中Tem-Con和Fram-Acc用的是生成的两个关键帧计算的嘛？Fram-Acc具体是怎么计算的呀？ #142

Comments

lankeren035 commented Nov 17, 2024

williamyang1991 commented Nov 18, 2024 • edited Loading

lankeren035 commented Nov 18, 2024

williamyang1991 commented Nov 18, 2024 • edited Loading

lankeren035 commented Nov 18, 2024

williamyang1991 commented Nov 18, 2024 •

edited

Loading

williamyang1991 commented Nov 18, 2024 •

edited

Loading