This is a project from the Career Foundry Data Analytics Program centered around using scikit-learn with Python for supervised/unsupervised machine learning techniques.
This data was scraped from TrueCar.com (and uploaded to Kaggle, where the dataset was collected here for use). Information included in the scraping were car price, mileage, year, make, and model. The purpose of this project is to explore relationships and patterns among the data.
Below is the link the dataset collected from Kaggle as "true_car_listings.csv":
The csv files in the "02 Data" folder were uploaded using Git LFS.
The scripts walk through:
- Data quality checks and exploratory analysis
- Predicting price using linear regression, random forest, and gradient boosting
- Using folium and geospatial data to create a choropleth map of price residuals
- Performing K-means clustering