Spark Streaming Mllib/Stream Example

Purpose of this project to do the entire pipeline of loading/modeling data and applying it to streaming data and storing it back into a database system.

Main Entry point is the Jupyter Notebook -> spark-nytimes-mllib-stream.ipynb

Notes:

Spark 2.0 allows pyspark to save mllib Pipelines, you can currently do this with Spark 1.6 with scala.
spark-cassandra-connecter is not avaliable yet for Spark 2.0
To shutdown the StreamingContext you need to either send a shutdown hook signal via SIGTERM or you can easily do by just restarting your notebook.
Purpose of creating features.py is so that if you were to try and create a production process you can just import this file in your model building script or your spark streaming script and apply the transformations.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
NYTimesBlogTest.csv		NYTimesBlogTest.csv
NYTimesBlogTrain.csv		NYTimesBlogTrain.csv
README.md		README.md
csv2cassandra.py		csv2cassandra.py
csv2stream.py		csv2stream.py
features.py		features.py
spark-nytimes-mllib-stream.ipynb		spark-nytimes-mllib-stream.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Streaming Mllib/Stream Example

Main Entry point is the Jupyter Notebook -> spark-nytimes-mllib-stream.ipynb

Notes:

About

Releases

Packages

Languages

neil90/spark-nyblog-mllib-stream

Folders and files

Latest commit

History

Repository files navigation

Spark Streaming Mllib/Stream Example

Main Entry point is the Jupyter Notebook -> spark-nytimes-mllib-stream.ipynb

Notes:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages