-
Notifications
You must be signed in to change notification settings - Fork 26
Build and Package
In order for EclairJS Client to talk to Apache Spark, it needs a instance of Apache Toree running and Toree must be able to connect to your Spark master.
Prerequisites
- Java 8 update 70 or higher
Instructions
-
Download Apache Spark 2.0.0 built with Hadoop 2.7 and extract it from the archive.
-
Install Jupyter (
pip install jupyter
for example) and the Jupyter Kernel Gateway (pip install jupyter-kernel-gateway
) -
Download and build Apache Toree
$ git clone https://github.com/apache/incubator-toree
$ cd incubator-toree
$ git checkout e8ecd0623c65ad104045b1797fb27f69b8dfc23f
$ make dist
This will create a dist
directory containing dist/toree/bin/run.sh
-
Download the EclairJS Server JAR file from Maven (http://repo2.maven.org/maven2/org/eclairjs/eclairjs-nashorn/${ECLAIRJS_VERSION}/eclairjs-nashorn-${ECLAIRJS_VERSION}-jar-with-dependencies.jar) (replace
${ECLAIRJS_VERSION}
with the version you are using) -
Download kernel.json and replace the following:
-
/usr/local/share/jupyter/kernels/apache_toree_scala/bin/run.sh
with the location of your installed Apache Toree (/usr/local/incubator-toree/dist/toree/bin/run.sh
for example on OSX, see the Location in step 3) -
"SPARK_HOME"
should point to the extracted Apache Spark directory (spark-2.0.0-bin-hadoop2.7
) -
/opt/nashorn/lib/eclairjs.jar
should point at the JAR file downloaded in step 4. If you run into memory issues (such as out of memory errors), you can up the memory limit in thekernel.json
file by adding--driver-memory 8g
toSPARK_OPT
- Figure out your Jupyter data directory by running:
$ jupyter --data
/Users/youruser/Library/Jupyter
Copy kernel.json
to kernels/eclair/
in the directory you got above.
- Start Jupyter:
jupyter notebook --no-browser
- Test EclairJS Client
To make sure everything is working, create a simple EclairJS Client example.
Create a file called package.json
:
{
"name": "eclairjs-test",
"version": "0.1.0",
"dependencies": {
"eclairjs": "*"
}
}
And a file called test.js
:
var eclairjs = require('eclairjs');
var spark = new eclairjs();
var sc = new spark.SparkContext("local[*]", "Simple Text");
var data = sc.parallelize([1,2,3,4,5,6,7,8,9,0]);
data.collect().then(function(val) {
console.log("Success:", val);
sc.stop().then(process.exit);
}).catch(function(err) {
console.log("Error:", err);
sc.stop().then(process.exit);
});
Install the dependencies:
$ npm install
Now we are ready to actually run the example:
$ node --harmony test.js
Starting WebSocket: ws://127.0.0.1:8888/api/kernels/436e67e6-2605-4085-9c5d-ba43d828a038
got kernel
Success: [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 ]
To run test suite:
$ npm run integration-test