This template provides a structured workflow tailored for audio machine learning research on the HPC Cluster of ZECM at TU Berlin. It was developed for projects that require continuous management of multiple experiments to ensure high reproducibility and reliability of results. By incorporating tools such as DVC, Docker, and TensorBoard, the template not only enhances reproducibility but also provides a robust framework for effective collaboration and seamless sharing of experiments.
- Reproducible Experiments:
- Tracks all dependencies, configurations, and artifacts to ensure experiments can be easily reproduced and shared.
- Uses containerization to maintain consistency across different systems.
- Resource Optimization:
- Reuses unchanged stages to avoid redundant computations, speeding up workflows and conserving resources.
- Automation:
- Reduces manual tasks through automated builds, data pipelines, and syncing, allowing you to focus on research.
- HPC Integration:
- Extends DVC for multi-node parallel experiments, optimizing HPC resource utilization.
- Supports Docker for development, with automated conversion to Singularity for seamless HPC deployment.
- TensorBoard Integration:
- Provides visualization and comparison of DVC experiments with audio logging support of TensorBoard.
- Enables real-time monitoring and quick decisions on underperforming runs.
The table below summarizes the key tools involved in the HPC-Cluster-ML-Workflow, detailing their primary roles and providing links to their official documentation for further reference.
Tool | Role | Documentation |
---|---|---|
Git | Version control for code. | Git Docs |
DVC | Data version control and pipeline management. | DVC Docs |
TensorBoard | DVC experiment visualization and monitoring. | TensorBoard Docs |
Docker | Containerization for development, converted to Singularity for HPC. | Docker Docs |
Singularity | HPC-compatible containerization tool. | Singularity Docs |
SLURM | Job scheduling and workload management on the HPC-Cluster. | SLURM Docs |
The figure below offers a simplified overview of how data is transferred between systems. While some of the commands depicted are automated by the provided workflows, the visualization is intended for comprehension and not as a direct usage reference.
- macOS, Windows or Linux operating system.
- Access to an HPC Cluster with SLURM-sheduler.
- Local Python installation.
- Familiarity with Git, DVC, and Docker.
- Docker Hub account.
Follow the setup instructions below for step-by-step guidance on configuring this template repository, which offers a basic PyTorch project that you can customize, reuse, or reference for your pipeline implementation.
Once the setup is complete, you can begin using the setup by referring to the User Guide provided. This guide will help you to understand how to develop, initiate experiments and monitor your training processes.
This project is licensed under the Apache License, Version 2.0. See the LICENSE.
Schulz, F. [faressc]. (n.d.). Guitar LSTM [pytorch-version]. GitHub. Link