Skip to content

Drake Analysis: a deeper look into the discography of Canada's Rap King using various NLP techniques.

Notifications You must be signed in to change notification settings

andreduong-zz/drake-analysis

Repository files navigation

Drake Analysis: A Deeper Look Into The Discography Of Canada's Rap King


Introduction

drake

Hip-Hop is my favorite music genre of all time, and Drake is an artist I've listened to for years. Many text mining analyses have been performed on rap lyrics, but I haven't seen enough works that actually dig deeper into one specific artist's discography (or in this case, Drake's discography). In this project, I will apply various Natural Language Process techniques to analyze Drake's lyrics.

As I have been doing a lot of practices on classical Machine Learning, or Deep Learning applied to images (Computer Vision), I don't have nearly as much experience working with text data. This project is my introduction to the world of Natural Language Processing, and text analysis in general.

What I Have Learned From This Project

In this project, I have practiced:

  • Scraped 500 songs info with lyrics from genius.com
  • Performed data wrangling and exploratory data analysis with Matplotlib and Seaborn
  • Applied various NLP techniques: word embedding, bag-of-words, tokenization with NLTK, NER with SpaCy
  • Topic modeling with LDA, dimensionality reduction with t-SNE, interactive topic visualization with pyLDAvis

Building the Dataset

Drake is one of the biggest, if not the biggest, rap artists in the world. Thanks to Drake's popularity, his work is very expansive and well-documented, with five official studio albums and six mixtapes. To obtain the lyrics of all Drake's songs, I scraped the data from Genius using the wonderful Genius API.

Data Cleaning

Despite having such easy-to-use API assisting the scraping process, cleaning the data was no easy task. Out of over 500 songs on Genius, there were around 300 tracks that are either duplicates, Live version, diss tracks, or the like, and they were all filtered from the dataset. The lyrics were not clean data either, as there were a lot of noise, redundant characters, and typos.

To see codes for the whole data preprocessing process, you can check this notebook. NBViewer Link.

Career Overview

First, let's load data from the csv file:

# load the data
data = pd.read_csv('lyrics.csv')
data.head(10)
name album year lyrics
0 Right to Left Born Successful 2009 blue green jewels with the supreme fuel and l...
1 Forever (Born Successful) Born Successful 2009 it may not mean nothing to yall but understan...
2 The Winner Born Successful 2009 i m performing tonight you know that shit gon...
3 I Do This Born Successful 2009 uh shits all good the deal got signed and my ...
4 Fallen Born Successful 2009 yeah its drake kc we was just walking just sm...
5 Do It Now Born Successful 2009 uh yeah alright uh well alright yeah well alr...
6 The Search Born Successful 2009 they say we killin em all all all all hip hop...
7 Juice Born Successful 2009 boi 1da drizzy yall dont really like me i can...
8 Man of the Year Comeback Season 2007 damn i done walked in here looking like the m...
9 Give Ya Comeback Season 2007 check look and i aint tryna get to know nobod...
data.tail(10)
name album year lyrics
204 Pop Style Views 2016 this sound like some forty three oh one shit ...
205 Grammys Views 2016 yeah yeah yeah yeah jheeze yeah right look lo...
206 Redemption Views 2016 yeah i get it i get it yeah why would i say a...
207 Too Good Views 2016 oh yeah yeah yeah oh yeah yeah yeah yeah look...
208 Controlla Views 2016 right my yiy just changed you just buzzed the...
209 Views [Trailer] Views 2016 the 6 raptors diamond key new ride old ride ba...
210 Summers Over Interlude Views 2016 ooh baby yeah days in the sun and nights in t...
211 Views Views 2016 question is will i ever leave you the answer ...
212 With You Views 2016 its about us right now girl where you going i...
213 One Dance Views 2016 baby i like your style grips on your waist fr...

From "Comeback Season" to "The Best In The World Pack"

Drake has a relatively huge, rich discography. According to this dataset, for 13 years of his music career, Drake has had 18 albums/mixtapes in total.

drake

Drake In The Streaming Era: Quantity > All?

In this streaming era, artists tend to put as many tracks as possible in one album to boost their streaming numbers. This is true for Drake as well: his more recent albums (More Life, Scorpion, Views) all have higher numbers of songs comparing to his older ones with the exception of 2009, when he released three tapes in one year.

drake

WordCloud

drake

Words Per Song

drake

Sentence Length

drake

Adding Word Counts & Unique Word Counts

name album year lyrics tokens Word Counts Unique Word Counts
106 Make Things Right Room for Improvement 2006 look if you a girl with the aspirations of be... [look, if, you, a, girl, with, the, aspiration... 456 223
109 Intro (Room For Improvement) Room for Improvement 2006 yo whats going on this is drake and ima let y... [yo, whats, going, on, this, is, drake, and, i... 184 92
108 Try Harder Room for Improvement 2006 sometimes i feel like lohan and hilary duff a... [sometimes, i, feel, like, lohan, and, hilary,... 395 198
107 Thrill Is Gone Room for Improvement 2006 loves lost loves gone love lost love is gone l... [loves, lost, loves, gone, love, lost, love, i... 579 260
105 Video Girl Room for Improvement 2006 uh yea get in my slick rick mode na mean im a... [uh, yea, get, in, my, slick, rick, mode, na, ... 782 338
104 All This Love Room for Improvement 2006 southern smoke this another one from your boy ... [southern, smoke, this, another, one, from, yo... 458 172
103 Pianist Hands Room for Improvement 2006 thank you ms graham for coming today you look ... [thank, you, ms, graham, for, coming, today, y... 161 101
110 Drakes Voice Mail Box #2 Room for Improvement 2006 what up this kim damn i ve been trying to get... [what, up, this, kim, damn, i, ve, been, tryin... 43 28
102 Drakes Voice Mail Box #1 Room for Improvement 2006 the man drake puts it the fuck down he s doin... [the, man, drake, puts, it, the, fuck, down, h... 162 85
100 Do What You Do Room for Improvement 2006 stance on lean leg up on the wall my niggas c... [stance, on, lean, leg, up, on, the, wall, my,... 829 262

Is Drake Becoming Less Lyrical? Actually, No.

drake

From the graph above, we can see:

  • "Take Care", Drake's 2011-2012 album that is widely recognized as his best albums, is actually one of his less lyrical ones.
  • Lyrics wise, Drake was at his worst in 2016-2017, when he made a breakthrough commercially with the mega hit "Hotline Bling" that helped him top the charts from all around the world and become the biggest artist across all genres at that time. It's a dilemma for most Hip-Hop artists: you make catchy, commercial friendly songs that will top the charts and make you a lot of money/fame, in exchange for your artistry.
  • However, Drake is making a comeback. Since 2018, both of his lyrical statistics has been skyrocketing, and they're going as strong as ever. As of now, Drake is currently at his peak lyrically and he shows no signs of stopping.

Named Entity Recognition With SpaCy

Named entity recognition (NER) is probably the first step towards information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

In this section, I built a Named Entity Recognizer using SpaCy. Here's a snippet of how SpaCy works on a given text:

drake

Topic Modeling with LDA & Interactive Topic Visualization via pyLDAvis

Here we go, the Machine Learning part of the project.

Topic modeling is a type of statistical modeling for discovering the abstract “topics” that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of topic model and is used to classify text in a document to a particular topic. It builds a topic per document model and words per topic model, modeled as Dirichlet distributions.

In this section, first I put all song lyrics into a list. Then, using Scikit-Learn's CountVectorizer, I will create a bag-of-words corpus representing all the lyrics. Lastly, I will train an LDA model, fit it, and implement an interactive, web-based topic visualization via pyLDAvis. Sample picture of the pyLDAvis interactive plot:

drake

About

Drake Analysis: a deeper look into the discography of Canada's Rap King using various NLP techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published