You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nice work! The Vision Token Merger method mentioned in TinyChart has been similarly explored in works like TextHawk and DocKylin. However, in those works, the merger is typically performed after the ViT (Vision Transformer). This paper introduces the idea of performing the vision token merger inside the ViT. I’m curious: does this approach offer any special advantages in terms of performance or interpretability?
The text was updated successfully, but these errors were encountered:
Nice work! The Vision Token Merger method mentioned in TinyChart has been similarly explored in works like TextHawk and DocKylin. However, in those works, the merger is typically performed after the ViT (Vision Transformer). This paper introduces the idea of performing the vision token merger inside the ViT. I’m curious: does this approach offer any special advantages in terms of performance or interpretability?
The text was updated successfully, but these errors were encountered: