Skip to content

Latest commit

 

History

History
45 lines (28 loc) · 1.44 KB

README.md

File metadata and controls

45 lines (28 loc) · 1.44 KB

NYC Taxi Data Engineering Pipeline (Yellow Taxi Trip Records)

This project outlines the steps needed to build a comprehensive data engineering pipeline using NYC taxi data from the years 2022 and 2023. This pipeline involves extracting, transforming and loading (ETL) data into a Snowflake database, followed by creating a dashboard for visualisation. The goal is to consolidate, clean, transform and store large volumes of taxi trip data in a Snowflake database and create a dashboard for visualising insights from the data.

If you find this project useful, kindly consider giving it a star ⭐ on GitHub.

alt text

Project Setup

  1. Clone the Repository:

    git clone https://github.com/nafisalawalidris/NYC_Taxi_Data_Pipeline.git
    cd NYC_Taxi_Data_Pipeline
  2. Create and Activate a Virtual Environment:

    python -m venv nyc_taxi_env
    .\nyc_taxi_env\Scripts\Activate  # On Windows
    source nyc_taxi_env/bin/activate  # On macOS/Linux
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Run the Scripts:

    • Follow the instructions in the scripts to extract, transform and load the data.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Contact

For any inquiries or suggestions, please contact Nafisa Lawal Idris.