STOR 538 - Sports Analytics
University of North Carolina at Chapel Hill
The primary goal of this project is to use data science methods to design models for predicting three different variables–Spread, Total, and OREB (offensive rebounds)–in games played by the National Basketball Association (NBA) between March 7 and March 21, 2025.
Our starting data was obtained from a GitHub repository titled “NBA-Data-2010-2024” and created by Vitalii Korolyk. The repository contains various CSV files with NBA data from 2010 to 2024, with some files providing information related to player statistics and others offering insights into team performances and overall game outcomes.
The variables Spread, Total, and OREB were all evaluated by mean absolute error (MAE). Our group ranked 2nd in the Spread category with MAE 11.08, 1st in Total with MAE 13.99, and 7th in OREB with MAE 4.95 out of 14 groups. Our paper was awarded 38 out of 39 possible points (97 percent).
If you are interested in the methodology, I have uploaded that paper to this repository. It outlines all of the steps taken in the data cleaning, feature engineering, and modeling processes.
Note: Not all code used to produce the final data set, variables, and predictions is included in this repository. This project was a four-person effort, and each team member contributed in different ways. I did not request that each member send me all of their code. In other words, the notebooks / scripts you see in this repository were written by me, Rhett Lavender.
- Keegan Burr, Information Science B.S., Senior
- Rhett Lavender, Data Science B.S., Junior
- Adalia Winters, Statistics and Analytics B.S., Junior
- Isabella Yeager, Statistics and Analytics B.S., Sophomore