Implementation of FILIP embedding model includes padding vectors in similarity computation #31

hyojeongyunn · 2025-02-19T08:19:43Z

Hello, and thank you for your work on this repository.

I have a question regarding the implementation of the FILIP embedding model in this repository.

In the original FILIP paper, it is mentioned that padding vectors are excluded from similarity computation to prevent performance degradation.

"Unlike Khattab & Zaharia (2020), we discard the padded tokens and use average instead summation of token-wise maximum similarities when computing the image-text alignment, which enhances the cross-modal representation learning and stabilizes training."

However, based on my understanding of the code here, it seems that padding vectors are also being used in the similarity calculation.
In the implementation, FILIP use topk selection in "get_weighted_dense_logits" function of FILIP model.
However, if we use top k value (input argument of get_weighted_dense_logits function) as a larger value than the number of vectors for each text/image sample, then padding vector can be used in the similarity calculation.
And theoretically, selecting top k vectors and dropping vector for padded token is not the same.

https://github.com/Sense-GVT/DeCLIP/blob/main/experiments/filip_experiments/yfcc15m/yfcc15m_vit_filip/config.yaml#L22
https://github.com/Sense-GVT/DeCLIP/blob/main/prototype/model/filip.py#L71-L106

I would like to confirm whether my understanding is correct. If padding vectors are indeed included in the similarity computation, could you clarify the reason behind this design choice?

Thank you for your time and support!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of FILIP embedding model includes padding vectors in similarity computation #31

Implementation of FILIP embedding model includes padding vectors in similarity computation #31

hyojeongyunn commented Feb 19, 2025

Implementation of FILIP embedding model includes padding vectors in similarity computation #31

Implementation of FILIP embedding model includes padding vectors in similarity computation #31

Comments

hyojeongyunn commented Feb 19, 2025