Predicting song's populairty on spotify

About

This is a Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence) which focuses on datas from https://www.kaggle.com/datasets/yamaerenay/spotify-dataset-19212020-600k-tracks?select=tracks.csv

Check out our Project Video and Slide Deck too

Problem Definition

Independent artists struggle to predict Spotify song popularity, leading to various drawbacks:

As an independent artist, it is difficult to gauge the potential popularity of a new song before releasing it on Spotify
Without data on how previous songs have performed, it's hard to make informed decisions about what to release next
Even with data, it is challenging to interpret it, especially for artists without a background in data science or analytics.

With these problems, about 70% of indie artists generate less than $10,000 from their music annually, despite accounting for 41.4% of the music industry.

Problem Statment: Develop a predictive model to predict the popularity of a song on Spotify before its release.

This will allow independent artists to:

Make more informed decisions about which songs to release and how to allocate their time and resources.
Better negotiate deals with record labels, potentially leading to more opportunities and revenue.
Tailor their marketing strategies and promotional efforts to maximize the song's impact upon release.
Enables artists to identify strengths and weaknesses, facilitating artistic growth and improvement in future releases.

Data Collection and Preperation

Raw Dataset used: https://www.kaggle.com/datasets/yamaerenay/spotify-dataset-19212020-600k-tracks?select=tracks.csv

The DataPrepandCleaning notebook shows the cleaning and preparing process

Exploratory Data Analysis / Visualisation

The EDA notebook shows the data analysis process where each attribute and its relationship with populairty were explored. The analysis suggests that when evalauting popularity:

Non-negligible Factors:

Duration
Explicit
Release Year
Release Month
Danceability
Energy
Loudness
Acousticness
Instrumentalness

Negligble Factors:

Number of Artists
Key
Mode
Speechiness
Liveness
Valence
Tempo
Time_Signature

Use of Machine Learning

The ML Models notebook shows how various ML models were used to predict populairty and the evaluation of these models.

Models Used

Linear Regression
Random Forest
Decision Tree
KNN (Found in new technique folder)

Final Insights

KNN algorithm gave us the best model to be used for prediction of popularity.

Conclusion

Future Implementations:

Web scraping and removing top artists from dataset to further target independent artists
Going a step further and classifying songs by genre first and then evaluating each

Insights:

Working with and cleaning a raw dataset to better address our problem statement
Using techniques such as cross-validation to improve models
Linking back model’s results to EDA to understand the workings of the model

What did we learn from this project?

Using clustering to analyze data
Modeling Using KNN and Random Forest for prediction
Cross Validation to reduce overfitting and error
Feature Selection on Models to determine importance

Contributions

Dixit Ayushman - EDA, KNN algorithm, final insights

Summit Bajaj - Problem Formulation, Data Prep and Cleaning, 3 ML models

References

https://www.kaggle.com/datasets/yamaerenay/spotify-dataset-19212020-600k-tracks?select=tracks.csv https://www.hypebot.com/hypebot/2020/12/stats-facts-data-independent-artists-need-to-know.html

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Data		Data
ExploratoryDataAnalysis		ExploratoryDataAnalysis
ML Models		ML Models
NewTechinque		NewTechinque
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
DSAI Mini Project Slides.pdf		DSAI Mini Project Slides.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting song's populairty on spotify

About

Problem Definition

Data Collection and Preperation

Exploratory Data Analysis / Visualisation

Use of Machine Learning

Models Used

Final Insights

Conclusion

What did we learn from this project?

Contributions

References

About

Releases

Packages

Contributors 2

Languages

ayushmandixit2000/DSAI-Project

Folders and files

Latest commit

History

Repository files navigation

Predicting song's populairty on spotify

About

Problem Definition

Data Collection and Preperation

Exploratory Data Analysis / Visualisation

Use of Machine Learning

Models Used

Final Insights

Conclusion

What did we learn from this project?

Contributions

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages