Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions of diving into the codes of the Fusion part #11798

Open
Cloud65000 opened this issue Jun 18, 2024 · 0 comments
Open

Questions of diving into the codes of the Fusion part #11798

Cloud65000 opened this issue Jun 18, 2024 · 0 comments
Assignees

Comments

@Cloud65000
Copy link

Hi! I'm a new learner of groundingdino. When I dive into the codes of the Phase A Fusion of groundingdino, I realize that the code firstly do the iamge-text cross-attention, later the text self attention, then the image self attention. It seems a little bit different from the dsicriptions of the original structure of the groundingdino model in the paper, which firstly do the image deformable attention and text self attention then the cross attention from text to image and from image to text. Is that True or whether I overlook some details of the code?
bothtext imageattention_code
grounding_dino_layers_fusion_code
original model structure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants