Skip to content

Files

Latest commit

ed471a9 · Mar 29, 2023

History

History

wiki-spark

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Mar 29, 2023
Mar 27, 2023

Wiki-Spark

Welcome to Wiki-Spark, where it will host a series of how-to's and tutorials by using PySpark.

Data Algorithms with Spark Data Algorithms with Spark

Introduction to Spark

  1. Apache Spark in a Nutshell
  2. A Gentle Introduction to Apache Spark
  3. Learning Spark (book), 2nd Edition

Introduction to PySpark

  1. First Steps With PySpark and Big Data Processing
  2. PySpark Tutorial: Getting Started with PySpark
  3. Introduction to PySpark
  4. Beginners Guide to PySpark

Spark RDDs and DataFrames

  1. Spark RDDs Tutorial
  2. PySpark RDD Tutorial, Learn with Examples
  3. Spark DataFrames Tutorial
  4. Introduction to PySpark
  5. PySpark – Create DataFrame with Examples
  6. Conversion: RDD to DataFrame
  7. Conversion: DataFrame to RDD
  8. flatMap() for RDD: RDD.flatMap()
  9. flatMap() for DataFrame: explode()

Basic and Simple Algorithms

  1. How to do Word Count in PySpark
  2. Finding Anagrams
  3. Finding K-mers
  4. Duplicate Removal in PySpark RDDs
  5. Duplicate Removal in PySpark DataFrames

Data Design Patterns

  1. Summarization design patterns -- mapPartitions()
  2. Join Patterns -- inner, left, right
  3. Top-10 Design Patterns

PySpark in Jupyter

  1. How to set up PySpark for your Jupyter notebook
  2. How to install PySpark and Jupyter Notebook in 3 Minutes

GarphFrames and Using it in Jupyter

  1. GarphFrames Overview
  2. Introducing GraphFrames
  3. How to use GraphFrames from Jupyter and PySpark
  4. GraphFrames in Jupyter: a practical guide
  5. Install PySpark in Jupyter on Mac using Homebrew

Using UDF (User Defined Functions) in PySpark

  1. How to write and use UDFs in Spark
  2. How to Write Spark UDF in Python?
  3. PySpark UDF
  4. PySpark UDF (User Defined Function)

Lambda Expressions

  1. How to Use Python Lambda Functions
  2. How to use Lambda Expressions
  3. Lambda Expressions Tutorial
  4. Python Lambda Examples

Monoid: Design Principle

  1. What is a Monoid?

Misc

  1. PySpark repartition() vs coalesce()
  2. How to reduce the verbosity of Spark runtime output
  3. PySpark Broadcast Variables
  4. PySpark Accumulator with Example

Data Algorithms with Spark Data Algorithms with Spark Data Algorithms with Spark