You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a couple of questions regarding the Nougat model as described in your paper.
Firstly, the paper mentions that the Nougat model employs a similar architecture to Donut, utilizing a Swin Transformer encoder and a Bart decoder. However, Donut uses a more complex pre-training technique, where the model is first taught to recognize text and then to handle structure. Could you clarify if the Nougat model also undergoes a similar pre-training process? It doesn't seem to be mentioned in the paper.
Secondly, could you please specify whether the Nougat model training started from the original pre-trained weights of swin_base_patch4_window12_384 and mbart-large-50, or did it begin with the pre-trained weights from the Donut model?
Thank you.
The text was updated successfully, but these errors were encountered:
Dear Authors,
I have a couple of questions regarding the Nougat model as described in your paper.
Firstly, the paper mentions that the Nougat model employs a similar architecture to Donut, utilizing a Swin Transformer encoder and a Bart decoder. However, Donut uses a more complex pre-training technique, where the model is first taught to recognize text and then to handle structure. Could you clarify if the Nougat model also undergoes a similar pre-training process? It doesn't seem to be mentioned in the paper.
Secondly, could you please specify whether the Nougat model training started from the original pre-trained weights of swin_base_patch4_window12_384 and mbart-large-50, or did it begin with the pre-trained weights from the Donut model?
Thank you.
The text was updated successfully, but these errors were encountered: