Spark Python app: spark-data-processing

This Python application performs all calculations in a single shot leveraging Spark on hadoop:

Calculations are performed taking files located at Hadoop static dir hdfs://hadoop-master:9000/dataProcessing/input/sample.csv

The script is executed by submitting as spark job, redirecting stdout to a file so we can keep results of calculations:

$: spark-submit sparkDataProcessing/test.py > spark-run.log

Provide feedback