GitHub - David-Durst/spark-jobserver: REST job server for Apache Spark

This is a fork of Spark Job Server designed for Amazon EC2. It contains scripts which launch an EC2 cluster, deploy an appropriately configured instance of the job server, and run a sample application.

Setting Up The EC2 Cluster

Sign up for an Amazon AWS account.
Assign your access key ID and secret access key to the bash variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
- I recommend doing this by placing the following export statements in your .bashrc file.
- export AWS_ACCESS_KEY_ID=accesskeyId
- export AWS_SECRET_ACCESS_KEY=secretAccessKey
Copy job-server/config/user-ec2-settings.sh.template to job-server/config/user-ec2-settings.sh and configure it. In particular, set KEY_PAIR to the name of your EC2 key pair and SSH_KEY to the location of the pair's private key.
- I recommend using an ssh key that does not require entering a password on every use. Otherwise, you will need to enter the password many times
Run bin/ec2_deploy.sh to start the EC2 cluster. Go to the url printed at the end of the script to view the Spark Job Server frontend. Change the port from 8090 to 8080 to view the Spark Standalone Cluster frontend.
Run bin/ec2_example.sh to setup the example. Go to the url printed at the end of the script to view the example.
Run bin/ec2_destroy.sh to shutdown the EC2 cluster.

Using The Example

Start a Spark Context by pressing the "Start Context" button.
Load data by pressing the "Resample" button. The matrix of scatterplots and category selection dropdown will only appear after loading data from the server.
- It will take approximately 30-35 minutes the first time you press resample after starting a new context. The cluster spends 20 minutes pulling data from an S3 bucket. It spends the rest of the time running the k-means clustering algorithm.
- Subsequent presses will refresh the data in the scatterplots. These presses will take about 10 seconds as the data is reloaded from memory using a NamedRDD.
After performing the data analysis, shutdown the context by pressing the "Stop Context" button.

Name		Name	Last commit message	Last commit date
Latest commit History 738 Commits
akka-app		akka-app
bin		bin
doc		doc
job-server-api/src/spark.jobserver		job-server-api/src/spark.jobserver
job-server-extras		job-server-extras
job-server-tests/src/spark.jobserver		job-server-tests/src/spark.jobserver
job-server		job-server
notes		notes
project		project
src/main/ls		src/main/ls
.gitignore		.gitignore
.jvmopts		.jvmopts
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.md		README.md
build.sbt		build.sbt
config		config
scalastyle-config.xml		scalastyle-config.xml
version.sbt		version.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setting Up The EC2 Cluster

Using The Example

About

Releases

Packages

Languages

License

David-Durst/spark-jobserver

Folders and files

Latest commit

History

Repository files navigation

Setting Up The EC2 Cluster

Using The Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages