The version of Spark in the original Cloudera image is still 1.6.1.
This image upgrade it to 2.2 by hacking methods, so I CAN NOT promise 100% compatible with the original CDH.
OSX: https://docs.docker.com/docker-for-mac/install/
Others: https://docs.docker.com/engine/installation/
https://docs.docker.com/compose/install/#install-compose
git clone https://github.com/bryanyang0528/docker-cdh-spark
cd docker-cdh-spark
docker build -t docker-cdh-spark:latest .
docker run -p 8080:8080 --hostname=quickstart.cloudera --privileged=true -ti --rm docker-cdh-spark:latest /bin/bash
docker-compose up -d
docker-compose exec docker-cdh-spark /bin/bash
ipython notebook --ip 0.0.0.0 --port 8080 --allow-root --NotebookApp.token=''
-
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
-
!pip install pyspark