-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
1 changed file
with
30 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,11 @@ | ||
<div align="center"> | ||
<img src="https://github.com/oriolcolomefont/Master-Thesis/blob/3477a79ff7821c1d296068c376b8afb854b7f092/Epidemic_Sound_Logo_White.png?raw=true" alt="Epidemic Sound Logo" width="200"/> | ||
<img src="https://github.com/oriolcolomefont/Master-Thesis/blob/54e35045debfb4f802cbc312afb681d6c41c7414/UPF-Logo.png?raw=true" alt="UPF Logo" width="200"/> | ||
</div> | ||
|
||
# [Master Thesis in Sound and Music Computing](https://zenodo.org/records/8380670) | ||
## Epidemic Sound AB & Universitat Pompeu Fabra (Music Technology Group) | ||
|
||
<img src="https://github.com/oriolcolomefont/Master-Thesis/blob/3477a79ff7821c1d296068c376b8afb854b7f092/Epidemic_Sound_Logo_White.png?raw=true" alt="alt text" width="200"/> | ||
<img src="https://github.com/oriolcolomefont/Master-Thesis/blob/54e35045debfb4f802cbc312afb681d6c41c7414/UPF-Logo.png?raw=true" alt="alt text" width="200"/> | ||
|
||
### Uncovering Underlying High-Level Musical Content in the Time Domain | ||
|
||
Leveraging self-supervised deep neural networks, inductive bias, and aural skills to learn deep audio embeddings with applications to boundary detection tasks. | ||
|
@@ -14,15 +16,15 @@ Leveraging self-supervised deep neural networks, inductive bias, and aural skill | |
|
||
**Date:** July 2023 | ||
|
||
**Date:** July 2023 | ||
|
||
## Abstract | ||
|
||
This thesis posits the existence of invariant high-level musical concepts that persist regardless of changes in sonic qualities, akin to the symbolic domain where essence endures despite varying interpretations through different performances, instruments, and styles, among many other, almost countless variables. | ||
|
||
### Collaboration and Methodology | ||
|
||
In collaboration with Epidemic Sound AB and the Music Technology Group (MTG) at Universitat Pompeu Fabra (UPF), we used self-supervised contrastive learning to uncover the underlying structure of Western tonal music by learning deep audio features to improve unsupervised music boundary detection. | ||
|
||
We applied deep convolutional neural networks and a triplet loss function to identify abstract and semantic high-level musical elements without relying on their sonic qualities. In doing so, we replaced traditional acoustic features with deep audio embeddings, paving the way for sound-agnostic and content-sensitive music representation for boundary detection. | ||
In collaboration with Epidemic Sound AB and the Music Technology Group (MTG) at Universitat Pompeu Fabra (UPF), we used self-supervised contrastive learning to uncover the underlying structure of Western tonal music. Our approach involved learning deep audio features to improve unsupervised music boundary detection. | ||
|
||
### Approach | ||
|
||
|
@@ -34,37 +36,48 @@ This model structure is integral to our work, allowing us to identify and mainta | |
|
||
Preliminary results suggest that, while not outperforming state-of-the-art results, our musically-informed technique has significant potential for boundary detection tasks and, most likely, nearly all MIR downstream tasks that are not purely sonic-based. | ||
|
||
While music-motivated audio embeddings don't outperform state-of-the-art results, they appear promising, delivering competitive results with room for improvement and potentially adaptable to other tasks constrained by data scarcity; the question remains if such general-purpose audio representation can mimic human hearing. | ||
While music-motivated audio embeddings don't outperform state-of-the-art results, they appear promising, delivering competitive results with room for improvement and potentially adaptable to other tasks constrained by data scarcity. The question remains if such a general-purpose audio representation can mimic human hearing. | ||
|
||
### Keywords | ||
|
||
- MIR, Music Structure Analysis, Deep audio embeddings, Audio representations, Representation learning, Embeddings, Transfer learning, Multi-task learning, Multi-modal learning, Aural Skills | ||
- MIR | ||
- Music Structure Analysis | ||
- Deep audio embeddings | ||
- Audio representations | ||
- Representation learning | ||
- Embeddings | ||
- Transfer learning | ||
- Multi-task learning | ||
- Multi-modal learning | ||
- Aural Skills | ||
|
||
## Virtual environment installation | ||
|
||
## Virtual environment installation | ||
|
||
### Pip | ||
|
||
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install the required dependencies. | ||
|
||
`pip install -r requirements.txt` | ||
```bash | ||
pip install -r requirements.txt | ||
|
||
Alternatively, you can use [conda](https://docs.conda.io/en/latest/) to create a virtual environment and install the required dependencies. | ||
Alternatively, you can use conda to create a virtual environment and install the required dependencies. | ||
|
||
`conda create -n project-env python=3.8` | ||
`conda activate project-env` | ||
`conda install --file requirements.txt` | ||
conda create -n project-env python=3.8 | ||
conda activate project-env | ||
conda install --file requirements.txt | ||
|
||
## Acknowledgements | ||
|
||
[Carl Thomé](https://github.com/carlthome) and [Carlos Lordelo](https://github.com/cpvlordelo), whose unrivaled expertise in the areas of MIR, ML, and DSP has not only been pivotal to the success of my thesis but their wisdom and guidance have been a constant source of motivation and enlightenment. Their genuine enthusiasm for mentorship and an intrinsic knack for going above and beyond has been truly inspiring. Throughout my journey, they have been more than just mentors; they have become the embodiment of academic kindness and professionalism. They never made me feel inferior or judged but treated me as an equal peer, fostering an environment of respect and intellectual growth. | ||
|
||
Their unwavering support and encouragement have empowered me to tackle and overcome the myriad challenges encountered during this journey. Their influence has left a lasting mark on my academic path and their mentorship, a privilege I will forever hold in high esteem. The profundity of their impact, which extends well beyond the scope of this thesis, is not easily encapsulated by mere words but is deeply felt in the better researcher I have become under their guidance. | ||
[Carl Thomé](https://github.com/carlthome) and [Carlos Lordelo](https://github.com/cpvlordelo), whose unrivaled expertise in the areas of MIR, ML, and DSP has not only been pivotal to the success of my thesis but their wisdom and guidance have been a constant source of motivation and enlightenment. | ||
|
||
## License | ||
|
||
*GNU General Public License v3.0* | ||
|
||
## Contact | ||
|
||
For any questions or concerns, please get in touch with Oriol Colomé Font at: *[email protected]*, *[email protected]*, or *[email protected]* | ||
For any questions or concerns, please get in touch with Oriol Colomé Font at: | ||
- *[email protected]* | ||
- *[email protected]* |