Skip to content

Utility that writes the results of a SQL statement to HDFS.

License

Notifications You must be signed in to change notification settings

rothrock/pipefish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Have you ever wanted to send your MySQL query results directly to a file
in HDFS? Well, now you can!

Pipefish sends the result of your SQL statement to a tab-delimited file in hdfs. 

It doesn't create any intermediate or temporary files. 
It just reads the rows from the MySQL server and writes them as tab-delimited
fields to a file in HDFS that you specify.
At the end of the row-set, pipefish flushes and closes the hdfs file and
closes the MySQL connection.

Usage:

$ pf --help
args:
 --defaults_file='/path/to/my.cnf'
 --user='user'
 --host='host'
 --db='db name'
 --password='password'
 --hdfs_path='/path/to/file'
 --sql='sql statement'
 [--overwrite]
user, host, db, and password settings override those specified in defaults_file.

Supply --overwrite to truncate the file instead of appending to it.


B U I L D   I N S T R U C T I O N S
===================================

Linux
=========

Set JAVA_HOME.

Install Java architecture independent libraries. For example, openjdk-6-jre-lib.

Install MySQL development libs.

Install the libhdfs development bits from Cloudera.

  If your dev environment uses 'parcels' installed by Cloudera Manager,
  make sure your host configured as an HDFS gateway.
  That will get you the HDFS headers and libs that you'll need.

  -- Otherwise --

  Configure your package manager by downloading and installing the "1-click install" package.
  http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_cdh5_install.html?scroll=concept_gp2_q32_24_unique_1

  And then:

  $ sudo apt-get install libhdfs0-dev (Ubuntu)
  $ sudo yum install hadoop-libhdfs-devel (Linux)

Build and install pipefish.

  $ make
  $ sudo make install

  `make install' puts it in /usr/local/bin.


On a Mac
========

Install Java

Install MySQL development libs.

Build the HDFS libs from a Cloudera tarball distribution.

  Get your gcc toolchain installed along with cmake, ant, and maven.

  Download and upack the CDH5 tarball. For example:
  http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.1.0.tar.gz

  Build just the hdfs lib that we need:

  $ cd hadoop-2.3.0-cdh5.1.0/src/hadoop-hdfs-project/hadoop-hdfs
  $ sudo mvn package -Pdist,native -DskipTests

  Built OK? Put the headers and libs in a place that GCC can find them:

  $ sudo cp `find target/native -name libhdfs\.*` /usr/lib
  $ sudo cp `find target -name hdfs.h` /usr/include

Build and install it.

  $ make
  $ sudo make install

  `make install' puts it in /usr/local/bin.

About

Utility that writes the results of a SQL statement to HDFS.

Resources

License

Stars

Watchers

Forks

Packages

No packages published