Skip to content

The "Adventure Works - Spark" repository is a collection of code and resources for analyzing the Adventure Works dataset using Databricks, PySpark, Delta Lake, and Python. It provides examples and tools for ingesting, processing, and analyzing the data to gain insights

Notifications You must be signed in to change notification settings

iBalajiShanmugam/adventure-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adventure Works - Spark

This repository contains code and resources for ingesting, processing, and analyzing the Adventure Works dataset using Databricks, PySpark, Delta Lake, and Python.

Overview

The Adventure Works dataset is a sample dataset provided by Microsoft, representing a fictional bicycle company. This repository provides a set of notebooks and scripts to perform various tasks on the Adventure Works dataset, including data ingestion, data processing, and data analysis.

Table of Contents

Data Ingestion

The data ingestion process involves loading the Adventure Works dataset into your Databricks environment. The repository provides notebooks and scripts to ingest data from various sources, such as CSV files, databases, or external APIs. You can find the relevant code and instructions in the data-ingestion directory.

Data Processing

Once the data is ingested, the next step is to process and transform it for analysis. The data processing notebooks and scripts in the data-processing directory demonstrate how to clean, transform, and aggregate the Adventure Works dataset using PySpark and Delta Lake.

Data Analysis

With the data ingested and processed, you can perform various data analysis tasks on the Adventure Works dataset. The data-analysis directory contains notebooks that explore customer behavior, product performance, sales trends, and return patterns, providing valuable insights for business decision-making.

Customer Analysis

Product Analysis

Sales Analysis

About

The "Adventure Works - Spark" repository is a collection of code and resources for analyzing the Adventure Works dataset using Databricks, PySpark, Delta Lake, and Python. It provides examples and tools for ingesting, processing, and analyzing the data to gain insights

Topics

Resources

Stars

Watchers

Forks

Languages