Prototype EnCodecMAE (PECMAE)

Code for the paper "Leveraging Pre-trained Autoencoders for Interpretable Prototype Learning of Music Audio."

The autoencoder code is available in this repository.

Sonification results are available in the companion web site.

Citation

If results, insights, or code developed within this project are useful for you, please consider citing our work:

@inproceedings{alonso2024leveraging,
  author    = "Alonso-Jim\'{e}nez, Pablo and Pepino, Leonardo and Batlle-Roca, Roser and Zinemanas, Pablo and Bogdanov, Dmitry and Serra, Xavier and Rocamora, Mart\'{i}n",
  title     = "Leveraging Pre-trained Autoencoders for Interpretable Prototype Learning of Music Audio",
  maintitle = "IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
  booktitle = "ICASSP Workshop on Explainable AI for Speech and Audio (XAI-SA)",
  year      = 2024,
}

Setup

Create a virtual environment (recommended):

python -m venv venv && source venv/bin/activate

Initialize submodules and install dependencies:

./setup.sh

NOTE

The setup script was only tested with Python 3.11 using CentOS 7.5. It may not work in other environments.

Experiments

Pre-processing

Download a dataset:

python src/download.py --dataset gtzan

Note: For now we have only implemented download functionality for GTZAN, but we could also implement medley-solos-db in the future.

Extract the EnCodecMAE-based features:

python src/encode_encodecmae.py audio/gtzan/ feats/gtzan/ --model diffusion_4s

The available options are: base, large, diffusion_1s, diffusion_4s, and diffusion_10s. base, and large are EnCodecMAE embeddings (not intended to operate with the diffusion decoder). diffusion_4s is the model that we used in the paper, and diffusion_10s is a newer version that was not included on the paper, but we provide sonification examples in the companion website.

Training PECMAE models

We provide a script to train the PECMAE model with the GTZAN dataset:

./scripts/train_pecmae_5_gtzan.sh

These parameters of the script can be easily modified to train with other configurations.

Train the baseline models

TODO

Using PECMAE with your data

To use PECMAE with your custom dataset, follow these steps:

Given an audio dataset located at /your/dataaset/, extract the EnCodecMAE features:

python src/encode_encodecmae.py /your/dataset/ feats/your_dataset/ --model diffusion_4s --format .your_format

Create a training script similar to ./scripts/train_pecmae_5_gtzan.sh.

Your should modify the fields --data-dir, --metadata-file-train, --metadata-file-val, and --metadata-file-test to point to your ground truth file. Have a look at groundtruth/ to see examples of the expected format.

Sonifying the prototypes

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
docs		docs
encodecmae @ 4a3be20		encodecmae @ 4a3be20
encodecmae-to-wav @ 701588e		encodecmae-to-wav @ 701588e
groundtruth		groundtruth
scripts		scripts
src		src
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prototype EnCodecMAE (PECMAE)

Citation

Setup

Experiments

Pre-processing

Training PECMAE models

Using PECMAE with your data

Sonifying the prototypes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

palonso/pecmae

Folders and files

Latest commit

History

Repository files navigation

Prototype EnCodecMAE (PECMAE)

Citation

Setup

Experiments

Pre-processing

Training PECMAE models

Using PECMAE with your data

Sonifying the prototypes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages