-
Notifications
You must be signed in to change notification settings - Fork 0
Utility that writes the results of a SQL statement to HDFS.
License
rothrock/pipefish
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Have you ever wanted to send your MySQL query results directly to a file in HDFS? Well, now you can! Pipefish sends the result of your SQL statement to a tab-delimited file in hdfs. It doesn't create any intermediate or temporary files. It just reads the rows from the MySQL server and writes them as tab-delimited fields to a file in HDFS that you specify. At the end of the row-set, pipefish flushes and closes the hdfs file and closes the MySQL connection. Usage: $ pf --help args: --defaults_file='/path/to/my.cnf' --user='user' --host='host' --db='db name' --password='password' --hdfs_path='/path/to/file' --sql='sql statement' [--overwrite] user, host, db, and password settings override those specified in defaults_file. Supply --overwrite to truncate the file instead of appending to it. B U I L D I N S T R U C T I O N S =================================== Linux ========= Set JAVA_HOME. Install Java architecture independent libraries. For example, openjdk-6-jre-lib. Install MySQL development libs. Install the libhdfs development bits from Cloudera. If your dev environment uses 'parcels' installed by Cloudera Manager, make sure your host configured as an HDFS gateway. That will get you the HDFS headers and libs that you'll need. -- Otherwise -- Configure your package manager by downloading and installing the "1-click install" package. http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/cdh5ig_cdh5_install.html?scroll=concept_gp2_q32_24_unique_1 And then: $ sudo apt-get install libhdfs0-dev (Ubuntu) $ sudo yum install hadoop-libhdfs-devel (Linux) Build and install pipefish. $ make $ sudo make install `make install' puts it in /usr/local/bin. On a Mac ======== Install Java Install MySQL development libs. Build the HDFS libs from a Cloudera tarball distribution. Get your gcc toolchain installed along with cmake, ant, and maven. Download and upack the CDH5 tarball. For example: http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.1.0.tar.gz Build just the hdfs lib that we need: $ cd hadoop-2.3.0-cdh5.1.0/src/hadoop-hdfs-project/hadoop-hdfs $ sudo mvn package -Pdist,native -DskipTests Built OK? Put the headers and libs in a place that GCC can find them: $ sudo cp `find target/native -name libhdfs\.*` /usr/lib $ sudo cp `find target -name hdfs.h` /usr/include Build and install it. $ make $ sudo make install `make install' puts it in /usr/local/bin.
About
Utility that writes the results of a SQL statement to HDFS.
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published