I decided to build a model to predict parking lot % of capacity (ie number of spots taken divided by total number of spots) at Jefferson County Open Space trailheads in Colorado. This would be useful to:
- Hikers: When is the best time to go for a hike? Will there be parking available?
- Open Space managers: How should we plan/allocate resources among the parks?
The data was shared by Lot Spot, which JeffCo Open Space has contracted with since August 2019 to monitor parking at seven of their popular trailheads:
- East Mount Falcon
- West Mount Falcon
- East Three Sisters
- West Three Sisters
- East White Ranch
- Lair o' the Bear
- Mount Galbraith
You can see real-time parking availability for these parks with LotSpot's mobile app. A camera located at the entrance to the parking lot detects when a vehicle enters or exits the lot. The raw data was not evenly spaced (there is a datapoint whenever a car enters/exits a lot), so the raw data was resampled to a regularly spaced timeseries (1 hr intervals) for analysis and modeling. The time range of the data was from 2019-08-30 to 2020-05-06.
Given the time constraints, I chose to first focus on a single park: East Mount Falcon. This is one of my personal favorites, had very few data gaps, and I know from experience can reach capacity.
Powered by Dark Sky Historical weather data was obtained from the Dark Sky API for the time period of the LotSpot observations. The API takes a location (lat/lon) and returns both daily and hourly observations for the date requested. The data contains many fields; for the purpose of this analysis I was interested in the following:
- Temperature
- UV Index
- Cloud Cover
- Precipitation Intensity
- Wind Gust
- A little bit of seasonal pattern, but not as much as expected.
- Note general increase after March 2020 - likely Covid-19 related, though can't be sure.
- Signficant difference between weekdays/weekends
- The target I am trying to predict is the hourly percent of capacity of the parking Lot (ie 0-100%)
- Day of week: Converted into Is-Weekend binary category.
- Temperature
- UV Index
- Cloud Cover
- Precipitation Intensity
- Hour of day - turned into dummy variables.
- Use only pre-Covid19 data (before March 1, 2020)
- Use only hours 6am to 8pm
- 80/20 Train/Test Split
- Train and tune model on training data, then evaluate on test-set.
- Measure performance by R^2 and RMSE
- Test-set RMSE : 28.9
- Looks like default model is overfitting on training data
Performance:
- Train-set R^2 : 0.95
- Test-set R^2 : 0.54
- Test-set RMSE : 19.5
Performance:
- Train-set R^2 : 0.75
- Test-set R^2 : 0.64
- Test-set RMSE : 17.2
Best Parameters:
- n_estimators : 100
- max_depth : 10
- max_features : 'log2'
- min_samples_split : 10
- Weather variabes and day of week most important
- Temperature shows generally positive dependence. Note breaks around freezing and around 55 degrees. What happens at higher temps?
- Cloud cover shows weaker negative trend, and there appears to be a larger negative shift at values > 0.5 .
- Precipitation Intensity shows slight negative trend, but a lot weaker than I expected.
- UV Index: Big break at value of ~2.
- A random forest model predicts hourly parking lot % capacity with R^2 of 0.64 and RMSE of 17.2 .
- Weather is really important!
- Need more data: Observe all seasons and weather conditions, as well as be able to isolate Covid-19 effects.
- Test different models and add/engineer more features (snow storms, holidays, etc.).
- Apply to different parks.
- Also predict # visitors, probability of lot being full, or waiting times.
- Compare observed vs. forecasted weather?
- Thanks to Hunter Berge and Connor McCormick at Lot Spot for sharing their data.
- Thanks to the Galvanize team (Frank, Kayla, Mike, Travis) and capstone group for feedback and support.