Recipe Analysis

Overview

This project processes and analyzes recipe data using Apache Spark. The application consists of two main tasks: preprocessing the raw data and performing analysis to extract insights about recipes containing beef and their cooking durations.

Prerequisites

Python 3.8+
Apache Spark

Note: For Using Apache Spark Java Develop Kit(JDK) in mandatory to working with Apache Spark.

Setup

Clone the repository:

git clone https://github.com/AnilkumarBorra/spark.git
cd recipe_analysis

Create a virtual environment and install dependencies:

python3 -m venv spark-env
source spark-env/bin/activate
pip install -r requirements.txt

Run the application locally:
```
python src/main.py
```

Run the application using Docker:

docker build -t recipe-analysis .
docker run recipe-analysis

Instructions

Ensure the input data (recipes-000.json,recipes-001.json,recipes-002.json) is located in the input folder.
The processed data and analysis results will be written to the output folder.

Testing

Unit tests are located in the tests directory. To run the tests:

pytest tests

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
recipe_analysis		recipe_analysis
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recipe Analysis

Overview

Prerequisites

Setup

Instructions

Testing

About

Releases

Packages

Languages

AnilkumarBorra/spark

Folders and files

Latest commit

History

Repository files navigation

Recipe Analysis

Overview

Prerequisites

Setup

Instructions

Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages