This repository contains work related to audio processing and music genre classifier. The documentation for each of the following sections can be found within the corresponding notebooks.
- 1. Input Data Visualization
- 2. MFCC Extraction
- 3. Genre Classifier using GTZAN dataset
- 4. Extracting & testing (predicting genre) of songs from Spotify
-
From the root of this repository, install the top-level package
src
:pip install -e .
-
Install all the libraries in the
requirements.txt
file:pip install -r requirements.txt
Some basic analysis of an audio file like waveform plotting, spectrum display were performed in order to understand about audio data type
- Waveform plotting
- Power Spectral Density (PSD) plot
- Spectrogram
- Mel Spectrogram
- MFCC
Utility to read all the .wav files stored in separate folders according to their genre, extract MFCC, and store these values in json format.
- Neural Network
- Improved Neural Network
- Convolutional Neural Network (CNN)
- RNN - LSTM
Following 10 genres were used for training:
0: "blues",
1: "classical",
2: "country",
3: "disco",
4: "hiphop",
5: "jazz",
6: "metal",
7: "pop",
8: "reggae",
9: "rock"
A script based pipeline script broadly involving the following steps:
- Download a ~30sec sample of all songs from a public Spotify playlist
- Convert the songs from
.mp3
to.wav
- Extract MFCCs from the playlist's tracks
- Extract MFCCs from the GTZAN dataset's tracks
- Train a Neural Network based model on GTZAN dataset
- Test and obtain results of the model on songs from Spotify playlist
Genre of a song is quite subjective, and a song can be composed of multiple genres. Instead of classifying a song into a single genre out of the 10 trained genres, the network outputs all the possible predictions of genres for each song. A song is broken into a certain number of segments (10 as chosen) and we get prediction of genre from each of the 10 segments. We simply extract the unique genres from the 10 (which usually turn out to be
Results of an LSTM based trained model on some of songs from the playlist 20th Century:
Song Name | Predicted Genre |
---|---|
Black or White | 3, '4', '7' |
Danger Zone | 3, 4, 7, '9' |
I Ran (So Far Away) | 3, 4, '9' |
Fast Car | 2, 4, '7', 8 |
Take My Breadth Away | 2, '3', '9' |
99 Luftballons | 2, '3', 6, '9' |
Indices in quotes and bold indicate the genre also reported by the respective song's Wikipedia page (including stylistic origins)