Skip to content

Latest commit

 

History

History
144 lines (103 loc) · 7.17 KB

README.md

File metadata and controls

144 lines (103 loc) · 7.17 KB

Few-shot Hypernets

Official PyTorch implementation of the papers:

  • HyperMAML: Few-Shot Adaptation of Deep Models with Hypernetworks (2022) Przewięźlikowski M., Przybysz P. , Tabor J., Zięba M., Spurek P., preprint

  • HyperShot: Few-Shot Learning by Kernel HyperNetworks (2022) Sendera M., Przewięźlikowski M., Karanowski K., Zięba M. Tabor J., Spurek P., preprint

@misc{sendera2022hypershot,
  doi = {10.48550/ARXIV.2203.11378},
  url = {https://arxiv.org/abs/2203.11378},
  author = {Sendera, Marcin and Przewięźlikowski, Marcin and Karanowski, Konrad and Zięba, Maciej and Tabor, Jacek and Spurek, Przemysław},
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {HyperShot: Few-Shot Learning by Kernel HyperNetworks},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Overview

HyperShot

Few-shot models aim at making predictions using a minimal number of labeled examples from a given task. The main challenge in this area is the one-shot setting where only one element represents each class. We propose HyperShot - the fusion of kernels and hypernetwork paradigm. Compared to reference approaches that apply a gradient-based adjustment of the parameters, our model aims to switch the classification module parameters depending on the task's embedding. In practice, we utilize a hypernetwork, which takes the aggregated information from support data and returns the classifier's parameters handcrafted for the considered problem. Moreover, we introduce the kernel-based representation of the support examples delivered to hypernetwork to create the parameters of the classification module. Consequently, we rely on relations between embeddings of the support examples instead of direct feature values provided by the backbone models. Thanks to this approach, our model can adapt to highly different tasks.

HyperMAML

The aim of Few-Shot learning methods is to train models which can easily adapt to previously unseen tasks, based on small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the general weights of the meta-model, which are further adapted to specific problems in a small number of gradient steps. However, the model’s main limitation lies in the fact that the update procedure is realized by gradient-based optimisation. In consequence, MAML cannot always modify weights to the essential level in one or even a few gradient iterations. On the other hand, using many gradient steps results in a complex and time-consuming optimization procedure, which is hard to train in practice, and may lead to overfitting. In this paper, we propose HyperMAML, a novel generalization of MAML, where the training of the update procedure is also part of the model. Namely, in HyperMAML, instead of updating the weights with gradient descent, we use for this purpose a trainable Hypernetwork. Consequently, in this framework, the model can generate significant updates whose range is not limited to a fixed number of gradient steps. Experiments show that HyperMAML consistently outperforms MAML and performs comparably to other state-of-the-art techniques in a number of standard Few-Shot learning benchmarks.

Requirements

  1. Python >= 3.7
  2. Numpy >= 1.19
  3. pyTorch >= 1.11
  4. GPyTorch >= 1.5.1
  5. (optional) neptune-client for logging traning results into your Neptune project.

Installation

pip install numpy torch torchvision gpytorch h5py pillow

Code of our method

Running the code

The various methods can be trained using the following syntax:

python train.py --dataset="miniImagenet" --method="hyper_maml" --train_n_way=5 --test_n_way=5 --n_shot=1 --seed=1 --train_aug

You can run

python train.py --h to list all the possible arguments.

The train.py script performs the whole training and evaluation procedure.

Methods

This repository provides implementations of several few-shot learning methods:

You must use those exact strings at training and test time when you call the script (see below).

Datasets

This is an example of how to download and prepare a dataset for training/testing. Here we assume the current directory is the project root folder:

cd filelists/DATASET_NAME/
sh download_DATASET_NAME.sh

Replace DATASET_NAME with one of the following: omniglot, CUB, miniImagenet, emnist, QMUL. Notice that mini-ImageNet is a large dataset that requires substantial storage, therefore you can save the dataset in another location and then change the entry in configs.py in accordance.

These are the instructions to train and test the methods reported in the paper in the various conditions.

In addition, you can select cross_char and cross datasets for cross-domain classification of Omnglot → EMNIST and mini-ImageNet → CUB, respectively.

Backbones

The script allows training and testing on different backbone networks. By default the script will use the same backbone used in our experiments (Conv4). Check the file backbone.py for the available architectures, and use the parameter --model=BACKBONE_STRING where BACKBONE_STRING is one of the following: Conv4, Conv6, ResNet10|18|34|50|101.

Neptune

We provide logging the training / validation metrics and details to Neptune. To do so, one must export the following env variables before running train.py.

export NEPTUNE_PROJECT=...
export NEPTUNE_API_TOKEN=...

Acknowledgements

This repository is a fork of: https://github.com/BayesWatch/deep-kernel-transfer, which in turn is a fork of https://github.com/wyharveychen/CloserLookFewShot.