Skip to content

Latest commit

 

History

History
23 lines (19 loc) · 1.43 KB

README.md

File metadata and controls

23 lines (19 loc) · 1.43 KB

Music Genre Classification

In this project I try to explore music audio features and train a DL model on the GTZAN Dataset which contains 30 second music .wav file over 10 genres. This work is based on the teaching of Valerio Velardo (Shoutout to his YouTube channel by the same name). In the IntroToAudioFeatures folder, preprocessing.py file contains the basic operations on a audio .wav file using the librosa library. I compute the FFT, STFT, and MFCCs of a sample audio file.

The GenreClassifier contains the 2 models (Fully connected neural network and an LSTM model) which perform the classification task over 10 categories. Since we only have 100 data samples for each genre. We use data augmentation by by chopping each track into different samples.

Number of segments per track = 10
Hoplength for Fourier Transforms = 512
Total Duration = 30 secs
Segment Duration = 3 secs
Sample rate = 22050
Num of MFCCs = 13

The overall dataset shape is

X = (num_genres * num_songs_in_genre * num_segments x num_hops_per_segment x 13)
Y = (num_genres * num_songs_in_genre * num_segments x 1)

For the FNN network, the number of hops per segments are flattened and passed into the network where as for the case of an LSTM, the hops per segments as kept as a temporal dimension for the RNN.

FNN Test Accuracy = 54% LSTM Test Accuracy = 65%