GitHub - mahmoudparsian/data-algorithms-with-spark: O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Data Algorithms with Spark by Mahmoud Parsian

"... This book will be a great resource for
both readers looking to implement existing
algorithms in a scalable fashion and readers
who are developing new, custom algorithms
using Spark. ..."

Dr. Matei Zaharia
Original Creator of Apache Spark

FOREWORD by Dr. Matei Zaharia

Data Algorithms with Spark by Mahmoud Parsian

Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)

Author: Mahmoud Parsian

Goal of this book: Data Algorithms with Spark

Story of this book: Data Algorithms with Spark

Mahmoud Parsian's Author Page @Amazon
Mahmoud Parsian's Author Page @LinkedIn
This new O'Reilly book is the successor Edition of Data Algorithms (published by O'Reilly)
This book uses PySpark (much simpler and readable)
Published date: April 8, 2022
@OReillyMedia: Data Algorithms with Spark, By @mahmoudparsian
Autor Contact: [ Email ] [ Mahmoud Parsian @LinkedIn ][ Mahmoud Parsian @GitHub ]

Github Chapter Solutions

This GitHub repository will host all source code and scripts for Data Algorithms with Spark
Chapter solutions are provided in PySpark and Scala
- PySpark solutions are provided by Mahmoud Parsian
- Scala solutions are provided by Deepak Kumar and Biman Mandal

Software:

All programs are tested with the following software:

Spark	Python	Scala	Java
Apache Spark 3.4.0	Python 3.10.5	Scala 2.13	Java 11

Chapter	Title
Glossary	Glossary of Big Data, MapReduce, Spark
Chapter 1	Introduction to Data Algorithms
Chapter 2	Transformations in Action
Chapter 3	Mapper Transformations
Chapter 4	Reductions in Spark
Chapter 5	Partitioning Data
Chapter 6	Graph Algorithms
Chapter 7	Interacting with External Data Sources
Chapter 8	Ranking Algorithms
Chapter 9	Fundamental Data Design Patterns
Chapter 10	Common Data Design Patterns
Chapter 11	Join Design Patterns
Chapter 12	Feature Engineering in PySpark

Bonus Chapters

Bonus Chapter	Title / Description
Glossary	Glossary of Big Data, MapReduce, Spark
Word Count	Solutions for Word Count using RDDs and DataFrames
Anagrams	Find words, which are anagrams
Lambda Expressions	Using Lambda Expressions in PySpark programs
TF-IDF	Term Frequency - Inverse Document Frequency
K-mers	K-mers for DNA Sequences
Correlation	All vs. All Correlation
Mapping Partitions	`mapPartitions()` Complete Example
UDF	User-Defined Function Examples
DataFrames Transformations	Examples on Creation and Transformation of DataFrames
DataFrames Tutorials	DataFrames Tutorials: from collections and CSV text files
Join Operations	Examples on join of RDDs and DataFrames
PySpark Tutorial 101	Examples on using PySpark RDDs and DataFrames
Physical Data Partitioning	Tutorial of Physical Data Partitioning
Monoids and Combiners	Monoid as a Design Principle

Name		Name	Last commit message	Last commit date
Latest commit History 479 Commits
code		code
data		data
docs		docs
images		images
wiki-spark		wiki-spark
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Algorithms with Spark by Mahmoud Parsian

Data Algorithms with Spark by Mahmoud Parsian

Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)

Author: Mahmoud Parsian

Goal of this book: Data Algorithms with Spark

Story of this book: Data Algorithms with Spark

Github Chapter Solutions

Software:

Table of Contents

Bonus Chapters

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

mahmoudparsian/data-algorithms-with-spark

Folders and files

Latest commit

History

Repository files navigation

Data Algorithms with Spark by Mahmoud Parsian

Data Algorithms with Spark by Mahmoud Parsian

Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)

Author: Mahmoud Parsian

Goal of this book: Data Algorithms with Spark

Story of this book: Data Algorithms with Spark

Github Chapter Solutions

Software:

Table of Contents

Bonus Chapters

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages