About poolformer as a tool for demonstration of MetaFormer #51

YoojLee · 2023-04-01T08:17:24Z

Hi, Thanks for the wonderful work, and I am really impressed with the proposed 'MetaFormer' concepts and experimental results you have provided! While reading the paper, some questions were raised regarding the poolformer and the concept of MetaFormer that I wanted to share with you.

As far as I understand, the metaformer basically consists of 'input embedding + iteration of blocks with [norm - token mixer - residual connection - norm - channel mixer - residual connection].' Then does MetaFormer not have consideration for non-overlapping patches or a sequence of flattened patches? If so, is the combination of token mixer and channel mixer with other components basically what we have for the 'MetaFormer' regardless of the hierarchical structure of networks or shape of inputs?
The poolformer has non-parametric 2D pooling for the token mixer, which is extremely simple compared to previous token mixers. However, the patch embedding inserted between the blocks seems to have implicit token mixing since it is a convolution with a smaller stride than its kernel size and eventually yields overlapped patches. Under the assumption of overlapping patches, I believe the resulting patches share information on the same spatial locations.

Thanks!

yuweihao · 2023-04-02T08:56:01Z

Hi @YoojLee ,

Thanks for your insightful discussion.

In my opinion, the core of MetaFormer is the repeated MetaFormer blocks. Thus, those models using hierarchical structure, like PVT, Swin and PoolFormer, are regarded as MetaFormer models.
For 4-stage hierarchical structure, the four patch embeddings shown in that paper actually can also be called downsampling layers similar to ResNet. Downsampling can also mix tokens, but its main function is to reduce resolution and increase channel numbers. ResNet and PoolFormer have similar hierarchical structures, the better performance of PoolFormer demonstrates the superior of MetaForemer. You may also refer what makes pooling competitive performance or even more than attention? #43.

YoojLee · 2023-04-02T10:06:08Z

Thanks for your reply!

I just want to confirm that what I understand is right. If I get your comment correct, the suggested MetaFormer concept is the mere stack of MetaFormer Block (which consists of normalization, token&channel mixer, and residual connection). Thus, regardless of the extent of inductive bias or whether overall architecture follows a hierarchical structure, the models with repetition of MetaFormer blocks become one of the MetaFormers.

YoojLee mentioned this issue Apr 3, 2023

MetaFormer is Actually What You Need for Vision (2022) YoojLee/paper_review#37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About poolformer as a tool for demonstration of MetaFormer #51

About poolformer as a tool for demonstration of MetaFormer #51

YoojLee commented Apr 1, 2023

yuweihao commented Apr 2, 2023

YoojLee commented Apr 2, 2023

About poolformer as a tool for demonstration of MetaFormer #51

About poolformer as a tool for demonstration of MetaFormer #51

Comments

YoojLee commented Apr 1, 2023

yuweihao commented Apr 2, 2023

YoojLee commented Apr 2, 2023