Skip to content

FYDP-Team1/Data-Pipeline

Repository files navigation

Simple-Meal Recipe Data Pipeline

This repository contains a collection of Python scripts that are used to scrape, clean, and process food recipe data. The data is then used to generate a SQL seed file for initializing a database.

Project Structure

The project is structured as follows:

The food-com-recipes/ directory contains the raw recipe data.

The data/ directory contains various CSV and YAML files that are used as inputs and outputs by the scripts.

The experiments/ directory contains experimental scripts that were used during the initial dataset exploration.

How to Use

  1. Run the scripts in the order of their numbering.
  2. The final output will be a SQL seed file (data/seed.sql) that can be used to initialize a database.

Dependencies

The scripts in this repository depend on several Python libraries, including Polars, BeautifulSoup, and Pint. The required libraries can be installed using the provided requirements.txt

About

Data Engineering for SimpleMeal

Resources

Stars

Watchers

Forks

Languages