Unique Aspects of Vision Token Merger in TinyChart #126

EchoDreamer · 2024-11-25T10:46:49Z

Nice work! The Vision Token Merger method mentioned in TinyChart has been similarly explored in works like TextHawk and DocKylin. However, in those works, the merger is typically performed after the ViT (Vision Transformer). This paper introduces the idea of performing the vision token merger inside the ViT. I’m curious: does this approach offer any special advantages in terms of performance or interpretability?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique Aspects of Vision Token Merger in TinyChart #126

Unique Aspects of Vision Token Merger in TinyChart #126

EchoDreamer commented Nov 25, 2024

Unique Aspects of Vision Token Merger in TinyChart #126

Unique Aspects of Vision Token Merger in TinyChart #126

Comments

EchoDreamer commented Nov 25, 2024