Common extensions to the Scalding MapReduce DSL.
Scalding-Commons includes Scalding Sources for use with the dfs-datastores project.
This library provides a VersionedKeyValSource
that allows Scalding to write out key-value pairs of any type into a binary sequencefile. Serialization is handled through twitter-util's Codec trait.
VersionedKeyValSource
allows multiple writes to the same path,as write creates a new version. Optionally, given a Monoid on the value type, VersionedKeyValSource
allows for versioned incremental updates of a key-value database.
import com.twitter.scalding.source.VersionedKeyValSource
import VersionedKeyValSource._
// ## Sink Example
// These codecs control key and value serialization.
implicit val keyCodec = new StringCodec
implicit val keyCodec = new IntCodec
val versionedSource = VersionedKeyValSource[String,Int]("path")
// creates a new version on each write
someScaldingFlow.write(versionedSource)
// because Scalding provides an implicit Monoid[Int],
// the writeIncremental method will add new integers into
// each value on every write:
someScaldingFlow.writeIncremental(versionedSource)
// ## Source Examples
//
// This Source produces the most recent set of kv pairs from the VersionedStore
// located at "path":
VersionedKeyValSource[String,Int]("path")
// This source produces version 12345:
VersionedKeyValSource[String,Int]("path", Some(12345))
Current version is 0.0.3. groupid="com.twitter" artifact=scalding-commons_2.9.2".
- Oscar Boykin http://twitter.com/posco
- Mike Jahr http://twitter.com/mjahr
- Sam Ritchie http://twitter.com/sritchie
Copyright 2012 Twitter, Inc.
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0