Onyx Task Bundle for Implementing Data Processing Tasks in R
onyx-r provides an Onyx task bundle for running data processing tasks in R.
A typical use case is running R models (created via statistical or machine learning algorithms) in Onyx job workflows, at scale:
- A data scientist exports a model as an RData file.
- An Onyx developer configures an onyx-r task to load the model on job submit time and use it to create predictions when bundles of Onyx segments arrive at the task.
Each Onyx peer runs an Rserve instance, each virtual peer holds a connection to its local Rserve instance. onyx-r tasks are configured at job submit time through pure Clojure data in the Onyx catalog. onyx-r tasks are implemented as pure R functions that take an Onyx segment as input and return a modified Onyx segment as output. For this to work seamlessly, onyx-r automatically translates between Clojure and R data structures. onyx-r tasks must be configured with the name of the R segment processing function to call.
When an onyx-r task is prepared for execution on a virtual peer through Onyx
lifecycles,
the task can be provided with R code to source
, R data (in RData
format
exported from R via save
) to load
and Clojure values to assign
to R
variables. These configuration options are also supplied by the user at job
submit time through the Onyx catalog.
First, install Rserve on each Onyx peer as described at: https://www.rforge.net/Rserve/doc.html
onyx-r is available in Clojars. Add this :dependency
to your Leiningen
project.clj
:
[sourcewerk/onyx-r "0.1.0-SNAPSHOT"]
Start a local Rserve server as documented at: https://www.rforge.net/Rserve/doc.html#start
Then type lein test
to runn all tests for onyx-r.
The following Clojure code block shows how to configure an onyr-r task through
add-task
:
(add-task
my-base-job
(onyx-r.tasks.r/r-function
:rfun ; name of the Onyx task
"rfun" ; name of the R function to call
{:source ["rfun <- function(segment) list(segment = segment, assigned = c(bar, baz), loaded = testData)"] ; R code to source when the task is prepared for execution on a virtual peer
:load [(onyx-r.util/slurp-bytes "testData.RData")] ; RData to load when the task is prepared for execution on a virtual peer
:assign {:bar 42
:baz "Hallo, Onyx!"}} ; R variables to assign when the task is prepared for execution on a virtual peer
batch-settings))
onyx-r.util/slurp-bytes
loads (RData) files into a
Byte array, as expected by onyx-r's :load
parameter.
The supplied demo jobs show how to use onyr-r's features in context:
Copyright © 2016 sourcewerk GmbH
Distributed under the Eclipse Public License, the same as Clojure and Onyx.
Commercial support is available through sourcewerk GmbH:
Email: [email protected]