Skip to content

palonso/pecmae

Repository files navigation

Prototype EnCodecMAE (PECMAE)

Code for the paper "Leveraging Pre-trained Autoencoders for Interpretable Prototype Learning of Music Audio."

The autoencoder code is available in this repository.

Sonification results are available in the companion web site.

Citation

If results, insights, or code developed within this project are useful for you, please consider citing our work:

@inproceedings{alonso2024leveraging,
  author    = "Alonso-Jim\'{e}nez, Pablo and Pepino, Leonardo and Batlle-Roca, Roser and Zinemanas, Pablo and Bogdanov, Dmitry and Serra, Xavier and Rocamora, Mart\'{i}n",
  title     = "Leveraging Pre-trained Autoencoders for Interpretable Prototype Learning of Music Audio",
  maintitle = "IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
  booktitle = "ICASSP Workshop on Explainable AI for Speech and Audio (XAI-SA)",
  year      = 2024,
}

Setup

  1. Create a virtual environment (recommended):
python -m venv venv && source venv/bin/activate
  1. Initialize submodules and install dependencies:
./setup.sh

NOTE

The setup script was only tested with Python 3.11 using CentOS 7.5. It may not work in other environments.


Experiments

Pre-processing

  1. Download a dataset:
python src/download.py --dataset gtzan 

Note: For now we have only implemented download functionality for GTZAN, but we could also implement medley-solos-db in the future.

  1. Extract the EnCodecMAE-based features:
python src/encode_encodecmae.py audio/gtzan/ feats/gtzan/ --model diffusion_4s

The available options are: base, large, diffusion_1s, diffusion_4s, and diffusion_10s. base, and large are EnCodecMAE embeddings (not intended to operate with the diffusion decoder). diffusion_4s is the model that we used in the paper, and diffusion_10s is a newer version that was not included on the paper, but we provide sonification examples in the companion website.

Training PECMAE models

  1. We provide a script to train the PECMAE model with the GTZAN dataset:
./scripts/train_pecmae_5_gtzan.sh

These parameters of the script can be easily modified to train with other configurations.

  1. Train the baseline models
TODO

Using PECMAE with your data

To use PECMAE with your custom dataset, follow these steps:

  1. Given an audio dataset located at /your/dataaset/, extract the EnCodecMAE features:
python src/encode_encodecmae.py /your/dataset/ feats/your_dataset/ --model diffusion_4s --format .your_format
  1. Create a training script similar to ./scripts/train_pecmae_5_gtzan.sh.

Your should modify the fields --data-dir, --metadata-file-train, --metadata-file-val, and --metadata-file-test to point to your ground truth file. Have a look at groundtruth/ to see examples of the expected format.

Sonifying the prototypes

TODO

About

Small classification experiments on top of ecodec codes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published