Skip to content

Performing big data analysis on a comprehensive dataset containing clickstream data from an online store specializing in clothing for pregnant women.

License

Notifications You must be signed in to change notification settings

KamandShayegan/a-bigdata-clickstream-analysis

Repository files navigation

E-Shopping Clickstream - A Big Data Analysis

Introduction

In this project, we analyzed a clickstream dataset from an online store for maternity clothing. We started with a review of related works and a detailed description of the dataset, including schema and data types. Using Matplotlib, we visualized the data with bar and pie charts to explore distributions and relationships. Our analysis included predictive modeling with a Gradient Boosted Tree, evaluated using various techniques, yielding promising results. We also applied association rule mining to uncover significant patterns, using metrics like support, confidence, and lift. Clustering analysis revealed natural groupings in the data, enhancing our understanding. Overall, these analyses provide insights for developing further predictive models on practical features. Check out the complete report analysis here.

Contributors

  • Kamand Sedaghat Shayegan (grade: 5/5)
  • Nastaran Taefi Aghdam

Building

  1. Python 3.x - Download and install from python.org.
  2. Java Development Kit (JDK) - Required for PySpark. Download and install from oracle.com.
  3. Apache Spark - Download and install from spark.apache.org.
  4. Jupyter Notebook - Install via pip.
  5. Required Python Packages: pyspark, matplotlib, pandas - Install via pip
  6. Run Each Project File

Final Note

This dataset is licensed under the GNU General Public License (GPL) Version 3, which permits copying and distribution while preserving certain freedoms. However, it's important to note that the license prohibits modifications.

About

Performing big data analysis on a comprehensive dataset containing clickstream data from an online store specializing in clothing for pregnant women.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published