Skip to content

Neural proxies for non-differentiable black-box sound synthesizers.

Notifications You must be signed in to change notification settings

pcmbs/synth-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Proxies for Sound Synthesizers:
Learning Perceptually Informed Presets Representations

Overview

Code source repository for my master's thesis on Neural Proxies for Sound Synthesizers (available soon), in which I developed a method for approximating black-box sound synthesizers using neural networks.
The proposed method relies on training a neural network capable of mapping synthesizer presets onto a perceptually informed embedding space defined by a pretrained audio model. More specifically, given a preset $\underline{x}$ from a synthesizer $s$, a preset encoder $f_{\underline{\theta}}$ learns to minimize the $L^1$ distance between its representation of $\underline{x}$ and that produced by a pretrained audio model $g_{\underline{\phi}}$, derived from the synthesized audio $s(\underline{x})$. This process effectively creates a neural proxy for a given synthesizer by leveraging the audio representations learned by the pretrained model via a cross-modal knowledge distillation task.

The aim of this repository is to use various neural network architectures, including feedforward, recurrent, and transformer-based models, to encode presets from arbitrary software synthesizers, paving the way for the integration of non-differentiable, black-box synthesizers into end-to-end training pipeline relying on gradient descent.

Installation

This project supports installation via pip and Docker. After cloning the repository, choose one of the following methods:

Installation using pip

Create and activate a VE and install the required dependencies:

$ python -m venv .venv 
$ .venv/bin/activate # Use .venv\Scripts\activate on Windows
$ pip install -r requirements.txt

Installation using Docker

1. Build the Docker image

Create a Docker image, specifying your user details to avoid permission issues with mounted volumes.

$ docker build --build-arg UNAME=$(whoami) --build-arg UID=$(id -u) --build-arg GID=$(id -g)  -t synth-proxy:local .

2. Run the container

Run the container using the provided shell script

$ bash scripts/docker-run.sh

Note: If you encounter permission errors related to file access, adjust the ownership of the project directory:

$ sudo chown -R $(id -u):$(id -g) .

Environment Configuration

Before running the project, ensure you have a .env file in the root directory with the necessary environment variables set:

PROJECT_ROOT=/path/to/project/root/ 
WANDB_API_KEY=this-is-optional 
DEXED_PATH=/path/to/dexed/vst3
DIVA_PATH=/path/to/diva/vst3 
TALNM_PATH=/path/to/talnoisemaker/vst3 

Remarks: You do not need to install and link the synthesizers if you are not generating new datasets but are using the provided ones instead.

Main dependencies

The rest of the dependencies can be found in the requirements.txt file.

Synthesizers

Available Synthesizers

Currently, the project support the following synthesizers are supported: Dexed, Diva, and TAL-NoiseMaker.

The neural proxy's checkpoints for each synthesizer can be downloaded from the link.

Adding Synthesizers

To add a new synthesizer, follow these steps:

  1. Download and install the synthesizer (obviously).

  2. Make sure the synthesizer is supported by DawDreamer.

  3. Add the path to the synthesizer as environment variable in the .env file, e.g., NEW_SYNTH_PATH=/path/to/new/synth/vst3.

  4. Create a python file under ./src/data/synths/ containing the internal representation of the synthesizer, which is implemented as a tuple of SynthParameter instances. You can use ./src/data/synths/another_synth.py as a template together with the helper script ./src/utils/synth/get_synth_parameters.py to generate the representation of each synthesizer parameter (don't forget to manually double-check).

  5. Add additional arguments for each SynthParameter instance if desired (see ./src/utils/synth/synth_parameter.py and existing synthesizers for examples). These are used to constraint the sampling process used to generated synthetic presets.

  6. Add the created python file for the new synthesizer to the package under ./src/data/synths/init.py.

  7. Add the synthesizer name to the list of supported synthesizer names in the the PresetHelper class definition in ./src/utils/synth/preset_helper.py as well as in the SynthDataset class definition in ./src/data/datasets/synth_dataset.py.

  8. Create a configuration file under ./configs/export/synth for the synthesizer specifying the parameters to exclude. The excluded parameters will be set to their default values during sampling and will not be fed to the preset encoder.

  9. Once the datasets have been generated, files in the following configuration folders need to be added depending on the need (see existing synthesizers for examples): ./configs/eval/synth, ./configs/hpo/synth, ./configs/train/train_dataset, ./configs/train/val_dataset

Available Preset Encoders aka. Neural Proxies

An overview of the implemented neural proxies can be found under ./src/models/preset/model_zoo.py.

The checkpoint of each pretrained model can be downloaded from this link (Unzip and move all .ckpt files into the checkpoints folder).

Audio Models

Available Audio Models

Wrappers for the following audio models are available in the ./src/models/audio directory: EfficientAT, Torchopenl3, PaSST, Audio-MAE, and Mel-only features.

Remarks:

  • The aforementioned audio models were selected based on a custom evaluation pipeline built around the TAL-NoiseMaker Virtual Analog synthesizer. More details as well as the evaluation's results can be found in the dedicated wandb report.
  • Only the mn04_all_b variation of the EfficientAT models with average time pooling was used since the main objective of this work is to evaluate the feasibility of the proposed method rather than to find an optimal representation.

Adding Audio Models

  1. Create a ./src/models/audio/<new_model> directory and copy paste the model's source code into it.

  2. Create a wrapper for the model inheriting from the AudioModel abstract class available in ./src/models/audio/abstract_audio_model.py, and implement the necessary methods.

  3. Instantiate the model below the wrapper class (see existing models for example).

  4. Add the instantiated model to the models.audio package by adding it in ./src/models/audio/init.py.

Once these steps are completed, the model can be used to generate a new dataset using the argument audio_fe=<my_model>. See Data Generation for more details.

Remark: The current implementation only allow the use of 1D (i.e., reduced) audio embeddings and should be extended to 2D (i.e., non-reduced) audio embeddings in the future. That is, the forward method of an audio model must return a torch.Tensor of shape (B, D) where B is the batch size and D is the dimensionality of the embedding after reduction (e.g., avg_time_pool). The available reduction fonctions can be found under ./src/utils/reduce_fn.py.

Data Generation

The ./src/export/dataset_pkl.py script for Linux (also available for Windows) can be used to generate synthetic presets. It uses the torch dataset class SynthDataset and DawDreamer under the hood. Note that it is currently only possible to generate synthetic presets by sampling their parameters as follows:

  • Continuous numerical parameters are sampled from a uniform distribution between 0 and 1.
  • Discrete parameters, i.e., categorical, binary, or discretized numerical, are sampled from a categorical distribution.

See the aformentioned files and the export configuration for more details.

Example:

$ python src/export/dataset_pkl.py synth=talnm audio_fe=mn04 dataset_size=65_536 seed_offset=10 batch_size=512 num_workers=8 tag=test

Remark: The datasets used for hyperparameter optimization, training, and evaluation (synthetic and hand-crafted) can be downloaded from the link.

Hyperparameter Optimization

Details regarding the HPO can be found in ./configs/hpo/hpo.yaml and ./src/hpo/run.py.

Remark: the $L^1$ distance is used as the loss function, while the $L^1$ distance and mean reciprocal rank is used for validation.

Training

Details regarding training can be found in the training script and training configuration folder.

Remarks:

  • New experiments can be added to the existing ones in ./configs/train/experiments.
  • Custom learning rate schedulers can be implemented in ./src/utils/lr_schedulers.py.
  • training artifacts can be found in ./logs/train.
  • the $L^1$ distance is used as the loss function, while the L1 distance and mean reciprocal rank is used for validation.

Example:

$ python src/train.py experiment=dexed_tfm_b

Evaluation

Compute the average $L^1$ distance and the mean reciprocal rank on a test set of synthetic and hand-crafted presets. More details can be found in the evaluation script and corresponding configuration folder.

Example:

$ python src/eval.py model=dexed_tfm_b

Remarks:

  • evaluation results will be saved under ./logs/eval.
  • to perform evaluation on a newly trained model, it is required to add its evaluation config file under ./configs/eval/model (see example there).

Reproducibility

This section offers a detailed guide to replicate the results outlined in the paper. The project was developed on a Windows laptop equipped with an Intel i7 CPU and an RTX 2060 GPU (6GB VRAM), using Docker (through WSL2) with the provided Dockerfile. Most of the training was performed on a NVIDIA Tesla P100. Additional testing were conducted in a virtual environment using Python 3.10.11 and pip as dependencies manager.

Step-by-Step Guide

1. Clone the Repository

Begin by cloning this repository to your local machine using the following command:

$ git clone https://github.com/pcmbs/synth-proxy.git ; cd synth-proxy

2. Download Datasets

Download the datasets used for testing the models from the provided link. Unzip the folder and move the most nested eval folder directly into the data/datasets directory.

3. Download Model Checkpoints

Download the pretrained model checkpoints from this link. Unzip and move all .ckpt files into the checkpoints folder.

4. Setup Environment

Install all necessary dependencies and configure the environment using pip or Docker by following the instructions from the Installation section.

5. Configure Environment Variables

Create a .env file in the project root directory with the following content:

PROJECT_ROOT=/path/to/project/root/
WANDB_API_KEY=this-is-optional

Note:

  • Set PROJECT_ROOT=/workspace if you are using the provided Dockerfile.
  • The WANDB_API_KEY is optional and can be omitted if you do not wish to log results to WandB.

6. Evaluate Models

Run the evaluation script to test all models on the datasets of synthetic and hand-crafted presets:

$ python src/eval.py -m model="glob(*)"

To run without logging the results, use:

$ python src/eval.py -m model="glob(*)" ~wandb

7. Generate Results

Generate the tables and figures from the paper using that summarize the evaluation results:

$ python src/visualization/generate_tables.py ; python src/visualization/generate_umaps.py

The generated tables, along with additional figures, are saved under the ./results/eval directory, while the figures of the UMAP projections are saved under the ./results/umap_projections directory.

Troubleshooting

If you encounter any issues, drop me an email or feel free to submit an issue.

Thanks

Special shout out to Joseph Turian for his initial guidance on the topic and overall methodology, and to Gwendal le Vaillant for the useful discussion on SPINVAE from which the transformer-based preset encoder is inspired.

About

Neural proxies for non-differentiable black-box sound synthesizers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages