Skip to content

manish-kotra/Azure-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloud-Based Data Analytics Assignments

Overview

This repository contains solutions for assignments related to cloud-based data analytics, covering technologies such as Hadoop MapReduce, Apache Spark, Azure Cloud Platform, and Machine Learning using Azure ML. The assignments were completed as part of the MIE1628 - Cloud-Based Data Analytics course at the University of Toronto.

Assignments

Assignment 1: Hadoop MapReduce

  • Topic: Implementation of Line Counting and K-Means Clustering using Hadoop MapReduce.
  • Files:
    • Assignment1_Solution.pdf
  • Key Concepts:
    • Line Count Program Implementation
    • K-Means Clustering on MapReduce
    • Canopy Selection for K-Means Optimization

Assignment 2: Apache Spark

  • Topic: Data processing and recommendation system using PySpark and SQL Spark.
  • Files:
    • Assignment2_Solution.html
  • Key Concepts:
    • Counting Odd/Even Numbers from a Dataset
    • Salary Summation per Department
    • Word Count and Frequency Analysis using PySpark
    • Collaborative Filtering for Movie Recommendations
    • Model Evaluation using RMSE & MAE

Assignment 3: Spark and Cloud Data Platform

  • Topic: Intrusion Detection and Data Analysis using PySpark on Databricks.
  • Files:
    • Assignment3_Solution.html
  • Key Concepts:
    • Extracting and Processing KDD Cup 99 Data
    • Feature Engineering & Exploratory Data Analysis
    • Machine Learning Model for Intrusion Detection
    • Cloud-based Data Processing

Assignment 4: Azure Cloud Platform

  • Topic: Working with Azure Data Factory, Azure SQL Database, and ADLS Gen2.
  • Files:
    • Assignment4_Solution.html
  • Key Concepts:
    • Data Pipelines with Azure Data Factory
    • SQL Queries on Gender Jobs Data
    • Setting Up Bi-Directional Data Replication

Assignment 5: Azure Machine Learning

  • Topic: Machine Learning using Azure ML Studio and Stream Analytics.
  • Files:
    • Assignment5_Solution.ipynb
  • Key Concepts:
    • Stream Analytics with IoT Data Processing
    • Data Exploration and Preprocessing
    • Machine Learning Model Training and Evaluation
    • Automated ML and Hyperparameter Tuning

Setup and Usage

  1. Clone the repository:
    git clone https://github.com/manish-kotra/Azure-Projects.git
    cd Azure-Projects
  2. Open relevant files:
    • .pdf and .html files can be viewed in a browser or a PDF reader.
    • .ipynb files should be opened in Jupyter Notebook or Azure ML Studio.

Technologies Used

  • Big Data Processing: Hadoop, Apache Spark, PySpark
  • Cloud Platforms: Azure Data Factory, Azure SQL DB, ADLS Gen2
  • Machine Learning: Azure ML, Python, Scikit-Learn
  • Data Visualization & Analysis: Pandas, Matplotlib, SQL

License

This repository is for academic purposes only. Please do not plagiarize or distribute without permission.

Author

Manish Kumar - University of Toronto

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published