Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Regarding the Nougat Model's Pre-training Process #231

Open
JasonKitty opened this issue Jul 15, 2024 · 0 comments
Open

Questions Regarding the Nougat Model's Pre-training Process #231

JasonKitty opened this issue Jul 15, 2024 · 0 comments

Comments

@JasonKitty
Copy link

Dear Authors,

I have a couple of questions regarding the Nougat model as described in your paper.

Firstly, the paper mentions that the Nougat model employs a similar architecture to Donut, utilizing a Swin Transformer encoder and a Bart decoder. However, Donut uses a more complex pre-training technique, where the model is first taught to recognize text and then to handle structure. Could you clarify if the Nougat model also undergoes a similar pre-training process? It doesn't seem to be mentioned in the paper.

Secondly, could you please specify whether the Nougat model training started from the original pre-trained weights of swin_base_patch4_window12_384 and mbart-large-50, or did it begin with the pre-trained weights from the Donut model?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant