Teaching material for the course (CommE5070) "Deep Learning for Music Analysis and Generation" I taught at National Taiwan University (2023 Fall, 2024 Fall).
Lecturer: Yi-Hsuan Yang (https://affige.github.io/; [email protected]; [email protected])
“Music Information Research” (MIR) is an interdisciplinary research field that concerns with the analysis, retrieval, processing, and generation of musical content or information. Researchers involved in MIR may have a background in signal processing, machine learning, information retrieval, human-computer interaction, musicology, psychoacoustics, psychology, or some combination of these.
In this course, we are mainly interested in the application of machine learning, in particular deep learning, to address music related problems. Specifically, the course is divided to two parts: analysis and generation.
The first part is about the analysis of musical audio signals, covering topics such as feature extraction and representation learning for musical audio, music audio classification, melody extraction, automatic music transcription, and musical source separation.
The second part is about the generation of musical material, including symbolic-domain MIDI or tablatures, and audio-domain music signals such as singing voices and instrumental music. This would involve deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAE), Transformers, and diffusion models.
- Lecture 1. Introduction to the course (slides)
- Lecture 2. Fundamentals of musical audio (slides)
- Lecture 3. Music classification and transcription (slides)
- Lecture 4. Source separation (slides)
- Lecture 5. GAN & Vocoders (slides)
- Lecture 6. Fundamentals of symbolic music (slides)
- Lecture 7. Symbolic MIDI generation (slides)
- Lecture 8. Synthesis and timbre transfer (slides)
- Lecture 9. Differentiable DSP models and automatic mixing (slides1, slides2, slides3)
- Lecture 10. Singing voice generation (slides)
- Lecture 11. Text-to-music generation (slides)
- Lecture 12. Miscellaneous Topics (emotion/structure/alignment/rhythm) (slides)
- Lecture 1. Introduction to the course (slides1, slides2)
- Lecture 2. Fundamentals & Music representation (slides)
- Lecture 3. Analysis I (timbre): Automatic music classification and representation learning (slides)
- Lecture 4. Generation I: Source separation (slides)
- Lecture 5. Generation II: GAN & Vocoders (slides)
- Lecture 6. Generation III: Synthesis of notes and loops (slides)
- Lecture 7. Analysis II (pitch): Music transcription, Melody extraction, and Chord Recognition (slides1, slides2)
- Lecture 8. Generation IV: Symbolic MIDI generation (slides)
- Lecture 9. Generation V: Symbolic MIDI generation: Advanced topic on music structure (slides)
- Lecture 10. Generation VI: Singing voice generation (slides)
- Lecture 11. Generation VII: Text-to-music generation (slides)
- Lecture 12. Generation VIII: Differentiable DSP models and automatic mixing (slides)
- Lecture 13. Analysis III (rhythm) (slides)
The slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (https://creativecommons.org/licenses/by-nc-sa/4.0/). By downloading the slides, you agree to this license.