Skip to content
Awantik Das edited this page Mar 17, 2017 · 3 revisions

Introduction

  • What is Apache Spark?
  • Spark Jobs and APIs
  • Review of Resilient Distributed Datasets (RDDs), DataFrames, and Datasets
  • Review of Catalyst Optimizer and Project Tungsten
  • Review of the Spark 2.0 architecture

Resilient Distributed Datasets (RDD)

  • Internal workings of an RDD
  • Creating RDDs
  • Global versus local scopes
  • Transformations
  • Actions