Questions of diving into the codes of the Fusion part #11798

Cloud65000 · 2024-06-18T06:36:32Z

Hi! I'm a new learner of groundingdino. When I dive into the codes of the Phase A Fusion of groundingdino, I realize that the code firstly do the iamge-text cross-attention, later the text self attention, then the image self attention. It seems a little bit different from the dsicriptions of the original structure of the groundingdino model in the paper, which firstly do the image deformable attention and text self attention then the cross attention from text to image and from image to text. Is that True or whether I overlook some details of the code?

mm-assistant bot assigned hhaAndroid Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions of diving into the codes of the Fusion part #11798

Questions of diving into the codes of the Fusion part #11798

Cloud65000 commented Jun 18, 2024

Questions of diving into the codes of the Fusion part #11798

Questions of diving into the codes of the Fusion part #11798

Comments

Cloud65000 commented Jun 18, 2024