A Capstone project for a Springboard Data Engineering Bootcamp operated by Washington University
- Initial Data Collection files
- data_collections.py
- A file containing extraction functions
- data_collection_nb.ipynb
- A python notebook that was used to pull the datasets using the extraction functions
- data_collections.py
Contains the etl package with functions for extraction, transformation, and loading in their respective modules. Some of these functions have been limited to pull a smaller portion of the data for prototyping purposes.
ETL python scripts that run the extraction, transformation, and loading passing data between one another via Queues
Python notebooks that were used to prototype processes before writing the etl package
Contains etl log files
An example crontab file for pipeline operations of the etl scripts
An phenomenon I have heard of in NYC is that it can be faster to get somewhere by bike than by car. This is believable, but NYC is a large place; and for a visitor, or new resident, this may be difficult to determine. This project aims to allow a visitor or new resident of NYC to check if their trip is likely to be faster by bike, or by taxi. And what the weather would be like in the case that they were to bike.
- Taxi Trip Level Data
- 2014 Green Taxi
- 2014 Yellow Taxi
- Taxi Region Data
- CitiBike Trip Data
- Historical Weather Data (Open Weather API)
- Can get weather data for a lat, lon at a specified timestamp.
- Geocoding start & end points
- Google Maps Api and Open Maps API for geocoding. Google maps to be tested at a later date, since the trial period is 90 days, and I would like to continue using free credit.