Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between ETC and BigBird-ETC version #26

Open
lhl2017 opened this issue Jan 17, 2022 · 0 comments
Open

Differences between ETC and BigBird-ETC version #26

lhl2017 opened this issue Jan 17, 2022 · 0 comments

Comments

@lhl2017
Copy link

lhl2017 commented Jan 17, 2022

@manzilz Thank you for sharing the excellent research. :)

I have two quick questions. If I missed some info in your paper, could you please let me know what I missed?

Q1. Is the Global-local attention method used in the BigBird-ETC version totally the same as the ETC paper, otherwise Longformer?
As I know, some special tokens(global tokens) only take full attention to the restricted sequences according to the ETC paper. For example, in the HotpotQA task, a paragraph token attends to all tokens within the paragraph. Also, a sentence token attends to all tokens within the sentence. ( I can't find about how [CLS] and question tokens take attention to. )

In Longformer, the special tokens between sentences take full attention to the context.

In BigBird paper(above of section 3), the author said

"we add g global tokens that attend to all existing tokens."

It seems to say the BigBird-ETC version is similar to Longformer. However, when the author mentioned differences between Longformer and BigBird-ETC, point to the reference as an ETC (in Appendix E.3). It makes me confused.

Q2. Is there a source code or a pre-trained model for the BigBird-ETC version? If you could share it used in your paper, I will really appreciate it!

I look forward to your response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant