Awesome-Dataset-Distillation

A curated list of awesome papers on dataset distillation and related applications, inspired by awesome-computer-vision.

Dataset distillation is the task of synthesizing a small dataset such that models trained on it achieve high performance on the original large dataset. A dataset distillation algorithm takes as input a large real dataset to be distilled (training set), and outputs a small synthetic distilled dataset, which is evaluated via testing models trained on this distilled dataset on a separate real dataset (validation/test set). A good small distilled dataset is not only useful in dataset understanding, but has various applications (e.g., continual learning, privacy, neural architecture search, etc.). This task was first introduced in the 2018 paper Dataset Distillation [Tongzhou Wang et al., '18], along with a proposed algorithm using backpropagation through optimization steps.

In recent years (2019-now), dataset distillation has gained increasing attention in the research community, across many institutes and labs. More papers are now being published each year. These wonderful researches have been constantly improving dataset distillation and exploring its various variants and applications.

This project is curated and maintained by Guang Li, Bo Zhao, and Tongzhou Wang.

How to submit a pull request?

🌐 Project Page
Code
📖 bibtex

Main

Dataset Distillation (Tongzhou Wang et al., 2018) 🌐 📖

Applications

Continual Learning

Reducing Catastrophic Forgetting with Learning on Synthetic Data (Wojciech Masarczyk et al., CVPR 2020 Workshop) 📖
Condensed Composite Memory Continual Learning (Felix Wiewel et al., IJCNN 2021) 📖
Distilled Replay: Overcoming Forgetting through Synthetic Samples (Andrea Rosasco et al., IJCAI 2021 Workshop) 📖
Sample Condensation in Online Continual Learning (Mattia Sangermano et al., IJCNN 2022) 📖
Summarizing Stream Data for Memory-Restricted Online Continual Learning (Jianyang Gu et al., 2023) 📖

Privacy

SecDD: Efficient and Secure Method for Remotely Training Neural Networks (Ilia Sucholutsky et al., AAAI 2021) 📖
Privacy for Free: How does Dataset Condensation Help Privacy? (Tian Dong et al., ICML 2022) 📖
Can We Achieve Robustness from Data Alone? (Nikolaos Tsilivis et al., ICML 2022 Workshop) 📖
Private Set Generation with Discriminative Information (Dingfan Chen et al., NeurIPS 2022) 📖
Towards Robust Dataset Learning (Yihan Wu et al., 2022) 📖
Backdoor Attacks Against Dataset Distillation (Yugeng Liu et al., NDSS 2023) 📖
Differentially Private Kernel Inducing Points (DP-KIP) for Privacy-preserving Data Distillation (Margarita Vinaroz et al., 2023) 📖
Dataset Distillation Fixes Dataset Reconstruction Attacks (Noel Loo et al., 2023) 📖

Medical

Soft-Label Anonymous Gastric X-ray Image Distillation (Guang Li et al., ICIP 2020) 📖
Compressed Gastric Image Generation Based on Soft-Label Dataset Distillation for Medical Data Sharing (Guang Li et al., CMPB 2022) 📖
Dataset Distillation for Medical Dataset Sharing (Guang Li et al., AAAI 2023 Workshop) 📖
Dataset Distillation using Parameter Pruning (Guang Li et al., 2023) 📖

Federated Learning

Federated Learning via Synthetic Data (Jack Goetz et al., 2020) 📖
Distilled One-Shot Federated Learning (Yanlin Zhou et al., 2020) 📖
FedSynth: Gradient Compression via Synthetic Data in Federated Learning (Shengyuan Hu et al., 2022) 📖
Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments (Rui Song et al., 2022) 📖
DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics (Renjie Pi et al., 2022) 📖
Meta Knowledge Condensation for Federated Learning (Ping Liu et al., ICLR 2023) 📖
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning (Yuanhao Xiong & Ruochen Wang et al., CVPR 2023) 📖
Fed-GLOSS-DP: Federated, Global Learning using Synthetic Sets with Record Level Differential Privacy (Hui-Po Wang et al., 2023) 📖
Federated Virtual Learning on Heterogeneous Data with Local-global Distillation (Chun-Yin Huang et al., 2023) 📖

Graph Neural Network

Graph Condensation for Graph Neural Networks (Wei Jin et al., ICLR 2022) 📖
Condensing Graphs via One-Step Gradient Matching (Wei Jin et al., KDD 2022) 📖
Graph Condensation via Receptive Field Distribution Matching (Mengyang Liu et al., 2022) 📖

Neural Architecture Search

Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data (Felipe Petroski Such et al., ICML 2020) 📖
Learning to Generate Synthetic Training Data using Gradient Matching and Implicit Differentiation (Dmitry Medvedev et al., AIST 2021) 📖

Fashion, Art, and Design

Wearable ImageNet: Synthesizing Tileable Textures via Dataset Distillation (George Cazenavette et al., CVPR 2022 Workshop) 🌐 📖
Learning from Designers: Fashion Compatibility Analysis Via Dataset Distillation (Yulan Chen et al., ICIP 2022) 📖

Knowledge Distillation

Knowledge Condensation Distillation (Chenxin Li et al., ECCV 2022) 📖

Recommender Systems

Infinite Recommendation Networks: A Data-Centric Approach (Noveen Sachdeva et al., NeurIPS 2022) 📖

Blackbox Optimization

Bidirectional Learning for Offline Infinite-width Model-based Optimization (Can Chen et al., NeurIPS 2022) 📖
Bidirectional Learning for Offline Model-based Biological Sequence Design (Can Chen et al., ICML 2023) 📖

Hashing Retrieval

Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching (Tao Feng & Jie Zhang et al., 2023) 📖

Tabular

New Properties of the Data Distillation Method When Working With Tabular Data (Dmitry Medvedev et al., AIST 2020) 📖

Text

Data Distillation for Text Classification (Yongqi Li et al., 2021) 📖

Media Coverage

Acknowledgments

We want to thank Nikolaos Tsilivis, Wei Jin, Yongchao Zhou, Noveen Sachdeva, Can Chen, Guangxiang Zhao, Shiye Lei, Xinchao Wang, Dmitry Medvedev, Seungjae Shin, Jiawei Du and Yidi Jiang for their valuable suggestions and contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 360 Commits
citations		citations
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Dataset-Distillation

How to submit a pull request?

Contents

Main

Early Work

Gradient/Trajectory Matching Surrogate Objective

Distribution/Feature Matching Surrogate Objective

Better Optimization

Distilled Dataset Parametrization

Label Distillation

Benchmark

Survey

Applications

Continual Learning

Privacy

Medical

Federated Learning

Graph Neural Network

Neural Architecture Search

Fashion, Art, and Design

Knowledge Distillation

Recommender Systems

Blackbox Optimization

Hashing Retrieval

Tabular

Text

Media Coverage

Acknowledgments

About

Releases

Packages

License

yifan-bao/Awesome-Dataset-Distillation

Folders and files

Latest commit

History

Repository files navigation

Awesome-Dataset-Distillation

How to submit a pull request?

Contents

Main

Early Work

Gradient/Trajectory Matching Surrogate Objective

Distribution/Feature Matching Surrogate Objective

Better Optimization

Distilled Dataset Parametrization

Label Distillation

Benchmark

Survey

Applications

Continual Learning

Privacy

Medical

Federated Learning

Graph Neural Network

Neural Architecture Search

Fashion, Art, and Design

Knowledge Distillation

Recommender Systems

Blackbox Optimization

Hashing Retrieval

Tabular

Text

Media Coverage

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages