Skip to content

Working With Lambda Functions

Doron Rosenberg edited this page Nov 3, 2016 · 2 revisions

Several Spark APIs take in lambda (also called anonymous) functions as arguments, which will get executed on the Spark worker nodes.

Passing variables to Lambdas

Lets say we want to filter out a specific word from an RDD and the specific word can't be hardcoded in the Lambda function. In EclairJS, we have added an additional argument called bindArgs just for this case. bindArgs is an array of values that will get appended to the arguments of the Lambda function:

var wordToFilterOut = 'foo;'

var filteredRDD = rdd.filter(function(word, wordToFilterOut) {
  return word.trim() !== wordToFilterOut;
}, [wordToFilterOut]);

Creating Spark class instances in Lambdas

As of version 0.4 of EclairJS, if a Lambda function needs to generate a Spark class instance, the Spark class needs to be passed in via the bindArgs array.

For example, in RDD.mapToPair, to generate a Pair we need to create a Tuple like this:

var eclairjs = require('eclairjs');

var spark = new eclairjs();

...

var pairRDD = rdd.mapToPair(function(word, Tuple) {
  return new Tuple2(word.toLowerCase(), 1);
}, [spark.Tuple2]); 

This is done for performance reasons - instead of sending over all Spark classes to be loaded in the workers, we can only send what is needed, reducing the amount of data transferred and the amount of data loaded into memory.