Skip to content

Latest commit

 

History

History
52 lines (33 loc) · 1.65 KB

SparkIntro.md

File metadata and controls

52 lines (33 loc) · 1.65 KB

Docker Image Test Status:

CircleCI

A Small Course on Big Data - GeoAnalysis using PySpark

House Keeping

Who's Here?

I love staying in touch here's a link to a form where you can add your details for me to stay in touch with you. I also love feedback good and bad! I love to get better at my job. So as we go though this course I want you to keep in mind that I will ask you to provide some feedback afterwards. You can keep it anonymous of choose to tell me who you are. See feedback form here: Feedback Form

  • Who is using Spark in Production?
  • Who is doing Geospatial Analysis using Spark?
  • Who is a programmer?
  • Who is a Data Janitor... err I mean Scientist 😄
  • Who is a hedge fund manager? ... here's my number 181821113 (bank account number, that is!)
  • Who is doing something else? I have missed?

Introduction

This workshop will introduce you to Apache Spark via the exciting domain of Geospatial Analysis.

Setup

Dependencies:

See: docker/README.md

Data

If you use docker the data will automatically downloaded into the work-flow folder. Otherwise here's the data for download

wget http://datax.academy/pydata-berlin-2016/06_Europe_Cities_Boundries_with_Labels_Population.geo.json
wget http://datax.academy/pydata-berlin-2016/pois.json

Plan

Introduction to Apache Spark and PySpark

  • What is Apache Spark
  • Why is it revolutionary
  • How does it work?
  • What does it contain?

Workshop Scenario

  • Static Data Analysis
  • Machine Learning Applications of Geospatial
  • Real-Time Analysis