PySpark + development version of Mongo Hadoop.
You can modify the virsion of SPARK_VERSION
to get the newest Spark. If you want to get newest version of Mongo-Hadoop, you have to update the ENV of MONGO_HADOOP_VERSION
and MONGO_HADOOP_COMMIT
.
sudo docker build -t zero323/mongo-spark --build-arg IP={YOUR-IP} .
sudo docker run -t -i --net=host --env SPARK_LOCAL_IP=$DOCKER_HOSTNAME zero323/mongo-spark /bin/bash
For details see: Getting Spark, Python, and MongoDB to work together