You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@manzilz Thank you for sharing the excellent research. :)
I have two quick questions. If I missed some info in your paper, could you please let me know what I missed?
Q1. Is the Global-local attention method used in the BigBird-ETC version totally the same as the ETC paper, otherwise Longformer?
As I know, some special tokens(global tokens) only take full attention to the restricted sequences according to the ETC paper. For example, in the HotpotQA task, a paragraph token attends to all tokens within the paragraph. Also, a sentence token attends to all tokens within the sentence. ( I can't find about how [CLS] and question tokens take attention to. )
In Longformer, the special tokens between sentences take full attention to the context.
In BigBird paper(above of section 3), the author said
"we add g global tokens that attend to all existing tokens."
It seems to say the BigBird-ETC version is similar to Longformer. However, when the author mentioned differences between Longformer and BigBird-ETC, point to the reference as an ETC (in Appendix E.3). It makes me confused.
Q2. Is there a source code or a pre-trained model for the BigBird-ETC version? If you could share it used in your paper, I will really appreciate it!
I look forward to your response.
The text was updated successfully, but these errors were encountered:
@manzilz Thank you for sharing the excellent research. :)
I have two quick questions. If I missed some info in your paper, could you please let me know what I missed?
Q1. Is the Global-local attention method used in the BigBird-ETC version totally the same as the ETC paper, otherwise Longformer?
As I know, some special tokens(global tokens) only take full attention to the restricted sequences according to the ETC paper. For example, in the HotpotQA task, a paragraph token attends to all tokens within the paragraph. Also, a sentence token attends to all tokens within the sentence. ( I can't find about how [CLS] and question tokens take attention to. )
In Longformer, the special tokens between sentences take full attention to the context.
In BigBird paper(above of section 3), the author said
It seems to say the BigBird-ETC version is similar to Longformer. However, when the author mentioned differences between Longformer and BigBird-ETC, point to the reference as an ETC (in Appendix E.3). It makes me confused.
Q2. Is there a source code or a pre-trained model for the BigBird-ETC version? If you could share it used in your paper, I will really appreciate it!
I look forward to your response.
The text was updated successfully, but these errors were encountered: