Code source repository for my master's thesis on Neural Proxies for Sound Synthesizers (available soon), in which I developed a method for approximating black-box sound synthesizers using neural networks.
The proposed method relies on training a neural network capable of mapping synthesizer presets onto a perceptually informed embedding space defined by a pretrained audio model. More specifically, given a preset
The aim of this repository is to use various neural network architectures, including feedforward, recurrent, and transformer-based models, to encode presets from arbitrary software synthesizers, paving the way for the integration of non-differentiable, black-box synthesizers into end-to-end training pipeline relying on gradient descent.
This project supports installation via pip and Docker. After cloning the repository, choose one of the following methods:
Create and activate a VE and install the required dependencies:
$ python -m venv .venv
$ .venv/bin/activate # Use .venv\Scripts\activate on Windows
$ pip install -r requirements.txt
Create a Docker image, specifying your user details to avoid permission issues with mounted volumes.
$ docker build --build-arg UNAME=$(whoami) --build-arg UID=$(id -u) --build-arg GID=$(id -g) -t synth-proxy:local .
Run the container using the provided shell script
$ bash scripts/docker-run.sh
Note: If you encounter permission errors related to file access, adjust the ownership of the project directory:
$ sudo chown -R $(id -u):$(id -g) .
Before running the project, ensure you have a .env
file in the root directory with the necessary environment variables set:
PROJECT_ROOT=/path/to/project/root/
WANDB_API_KEY=this-is-optional
DEXED_PATH=/path/to/dexed/vst3
DIVA_PATH=/path/to/diva/vst3
TALNM_PATH=/path/to/talnoisemaker/vst3
Remarks: You do not need to install and link the synthesizers if you are not generating new datasets but are using the provided ones instead.
- Pytorch and PyTorch Lightning for model definition and training.
- DawDreamer for rendering audio from VST plugins.
- Wandb for logging.
- Optuna for hyperparameter optimization.
- Hydra for configuration management.
The rest of the dependencies can be found in the requirements.txt file.
Currently, the project support the following synthesizers are supported: Dexed, Diva, and TAL-NoiseMaker.
The neural proxy's checkpoints for each synthesizer can be downloaded from the link.
To add a new synthesizer, follow these steps:
-
Download and install the synthesizer (obviously).
-
Make sure the synthesizer is supported by DawDreamer.
-
Add the path to the synthesizer as environment variable in the
.env
file, e.g.,NEW_SYNTH_PATH=/path/to/new/synth/vst3
. -
Create a python file under ./src/data/synths/ containing the internal representation of the synthesizer, which is implemented as a tuple of SynthParameter instances. You can use ./src/data/synths/another_synth.py as a template together with the helper script ./src/utils/synth/get_synth_parameters.py to generate the representation of each synthesizer parameter (don't forget to manually double-check).
-
Add additional arguments for each SynthParameter instance if desired (see ./src/utils/synth/synth_parameter.py and existing synthesizers for examples). These are used to constraint the sampling process used to generated synthetic presets.
-
Add the created python file for the new synthesizer to the package under ./src/data/synths/init.py.
-
Add the synthesizer name to the list of supported synthesizer names in the the PresetHelper class definition in ./src/utils/synth/preset_helper.py as well as in the SynthDataset class definition in ./src/data/datasets/synth_dataset.py.
-
Create a configuration file under ./configs/export/synth for the synthesizer specifying the parameters to exclude. The excluded parameters will be set to their default values during sampling and will not be fed to the preset encoder.
-
Once the datasets have been generated, files in the following configuration folders need to be added depending on the need (see existing synthesizers for examples): ./configs/eval/synth, ./configs/hpo/synth, ./configs/train/train_dataset, ./configs/train/val_dataset
An overview of the implemented neural proxies can be found under ./src/models/preset/model_zoo.py.
The checkpoint of each pretrained model can be downloaded from this link (Unzip and move all .ckpt
files into the checkpoints
folder).
Wrappers for the following audio models are available in the ./src/models/audio
directory: EfficientAT, Torchopenl3, PaSST, Audio-MAE, and Mel-only features.
Remarks:
- The aforementioned audio models were selected based on a custom evaluation pipeline built around the TAL-NoiseMaker Virtual Analog synthesizer. More details as well as the evaluation's results can be found in the dedicated wandb report.
- Only the
mn04_all_b
variation of the EfficientAT models with average time pooling was used since the main objective of this work is to evaluate the feasibility of the proposed method rather than to find an optimal representation.
-
Create a
./src/models/audio/<new_model>
directory and copy paste the model's source code into it. -
Create a wrapper for the model inheriting from the
AudioModel
abstract class available in ./src/models/audio/abstract_audio_model.py, and implement the necessary methods. -
Instantiate the model below the wrapper class (see existing models for example).
-
Add the instantiated model to the models.audio package by adding it in ./src/models/audio/init.py.
Once these steps are completed, the model can be used to generate a new dataset using the argument audio_fe=<my_model>
. See Data Generation for more details.
Remark: The current implementation only allow the use of 1D (i.e., reduced) audio embeddings and should be extended to 2D (i.e., non-reduced) audio embeddings in the future. That is, the forward method of an audio model must return a torch.Tensor of shape (B, D) where B is the batch size and D is the dimensionality of the embedding after reduction (e.g., avg_time_pool). The available reduction fonctions can be found under ./src/utils/reduce_fn.py.
The ./src/export/dataset_pkl.py script for Linux (also available for Windows) can be used to generate synthetic presets. It uses the torch dataset class SynthDataset and DawDreamer under the hood. Note that it is currently only possible to generate synthetic presets by sampling their parameters as follows:
- Continuous numerical parameters are sampled from a uniform distribution between 0 and 1.
- Discrete parameters, i.e., categorical, binary, or discretized numerical, are sampled from a categorical distribution.
See the aformentioned files and the export configuration for more details.
Example:
$ python src/export/dataset_pkl.py synth=talnm audio_fe=mn04 dataset_size=65_536 seed_offset=10 batch_size=512 num_workers=8 tag=test
Remark: The datasets used for hyperparameter optimization, training, and evaluation (synthetic and hand-crafted) can be downloaded from the link.
Details regarding the HPO can be found in ./configs/hpo/hpo.yaml and ./src/hpo/run.py.
Remark: the
Details regarding training can be found in the training script and training configuration folder.
Remarks:
- New experiments can be added to the existing ones in ./configs/train/experiments.
- Custom learning rate schedulers can be implemented in ./src/utils/lr_schedulers.py.
- training artifacts can be found in ./logs/train.
- the
$L^1$ distance is used as the loss function, while the L1 distance and mean reciprocal rank is used for validation.
Example:
$ python src/train.py experiment=dexed_tfm_b
Compute the average
Example:
$ python src/eval.py model=dexed_tfm_b
Remarks:
- evaluation results will be saved under ./logs/eval.
- to perform evaluation on a newly trained model, it is required to add its evaluation config file under ./configs/eval/model (see example there).
This section offers a detailed guide to replicate the results outlined in the paper. The project was developed on a Windows laptop equipped with an Intel i7 CPU and an RTX 2060 GPU (6GB VRAM), using Docker (through WSL2) with the provided Dockerfile. Most of the training was performed on a NVIDIA Tesla P100. Additional testing were conducted in a virtual environment using Python 3.10.11 and pip as dependencies manager.
Begin by cloning this repository to your local machine using the following command:
$ git clone https://github.com/pcmbs/synth-proxy.git ; cd synth-proxy
Download the datasets used for testing the models from the provided link. Unzip the folder and move the most nested eval
folder directly into the data/datasets
directory.
Download the pretrained model checkpoints from this link. Unzip and move all .ckpt
files into the checkpoints
folder.
Install all necessary dependencies and configure the environment using pip or Docker by following the instructions from the Installation section.
Create a .env
file in the project root directory with the following content:
PROJECT_ROOT=/path/to/project/root/
WANDB_API_KEY=this-is-optional
Note:
- Set
PROJECT_ROOT=/workspace
if you are using the provided Dockerfile. - The
WANDB_API_KEY
is optional and can be omitted if you do not wish to log results to WandB.
Run the evaluation script to test all models on the datasets of synthetic and hand-crafted presets:
$ python src/eval.py -m model="glob(*)"
To run without logging the results, use:
$ python src/eval.py -m model="glob(*)" ~wandb
Generate the tables and figures from the paper using that summarize the evaluation results:
$ python src/visualization/generate_tables.py ; python src/visualization/generate_umaps.py
The generated tables, along with additional figures, are saved under the ./results/eval
directory, while the figures of the UMAP projections are saved under the ./results/umap_projections
directory.
If you encounter any issues, drop me an email or feel free to submit an issue.
Special shout out to Joseph Turian for his initial guidance on the topic and overall methodology, and to Gwendal le Vaillant for the useful discussion on SPINVAE from which the transformer-based preset encoder is inspired.