Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Eq.1 #10

Open
LeungWaiHo opened this issue Feb 20, 2023 · 3 comments
Open

Question about Eq.1 #10

LeungWaiHo opened this issue Feb 20, 2023 · 3 comments

Comments

@LeungWaiHo
Copy link

LeungWaiHo commented Feb 20, 2023

I noticed "compute their semantic correlation as the verification score" about Eq.1
My question:

  1. Does it work as a similarity function?
  2. Could it be replaced by other similarity functions, such as cosine ...?
@LeungWaiHo LeungWaiHo changed the title Question about Eq.2 Question about Eq.1 Feb 20, 2023
@yangli18
Copy link
Owner

@LeungWaiHo
A1: Yes, it can be understood as a similarity function, which measures how relevant each visual feature is to the content described in the text.
A2: The output of this function, which measures the correlation/relevance between visual and text features, should range from 0 to 1, yet the cosine function has an output range of [-1, 1]. We use Eq.1 to adapt the cosine similarity outputs to [0, 1]. You can try other functions with similar effects.

@wildwolff
Copy link

I cannot understand why this S(x,y) in Eq.1 can be seen as the relevance score, and the code computes verify_score by element-wise multiplication without Transpose,which is a little different with Eq.1. Could you further explain it?Thanks a lot!

text_embed = self.text_proj(text_info)
img_embed = self.img_proj(img_feat)
verify_score = (F.normalize(img_embed, p=2, dim=-1) *
F.normalize(text_embed, p=2, dim=-1)).sum(dim=-1, keepdim=True)
verify_score = self.tf_scale *
torch.exp( - (1 - verify_score).pow(self.tf_pow)
/ (2 * self.tf_sigma**2))

@yangli18
Copy link
Owner

yangli18 commented Feb 24, 2023

It's just a matter of implementation. The inside part of Eq. 1 essentially computes the inner product of two feature vectors. Actually, you can use bmm after transposing the matrix/vector ( [Bx1xC] * [BxCx1] = [Bx1x1]), which is equivalent to the way I implemented it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants