Skip to content

Data Extraction Guide

Malinda Curtis edited this page Jan 29, 2016 · 2 revisions

##Data Extraction Guide

###Summary The first step to exporting your data requires you to identify and gain access to the source system or database that houses the data you will be exporting and sending to Socrata. This guide describes three options for exporting your LIVES data to a file programmatically which is a prerequisite to publishing the data automatically.

###Export Options ####Option 1: Use your existing tools If you have the resources to configure a process to generate exports of the data to a file in CSV or any other machine-readable format, then you are all set. Ideally the export tool or feature you are using to export should not require using a user interface. If it requires a user interface then the data publishing process will not be fully automatable and will require manual intervention for each data update, which must be performed at least once per month.

If you do not have an existing tool or feature to perform the export we have included guides for two recommended Extract-Transform-Load (ETL) tools that enable exporting your data to a CSV file programatically. Both tools do not require any programming and include capabilities not just to extract data but also to transform and publishing it automatically.

####Option 2: Pentaho Kettle Kettle is a completely free and open source tool from Pentaho that can be used to extract data from lots of different source systems and file formats (databases, Excel files, and many more formats/systems). If needed, it also allows you to perform transformations on your data and merge multiple data sources together. You can then easily output your data as a CSV file ready to publish to Socrata.

#####Installation Prerequisites

  • Runs on Windows, Mac, or Linux
  • Kettle requires the Sun Java Runtime Environment (JRE) version 1.5 (also called 5.0 in some naming schemes) or newer. You can obtain a JRE for free from http://java.sun.com
Windows Mac

Download the latest version from http://community.pentaho.com/projects/data-integration. Note that the download will start automatically and it is not necessary to click on anything on the page that you are redirected to. Once it has completed downloading, go to your Downloads folder and extract the application from the zip file. The data-integration folder can then be moved into any directory on your computer (see images below).

More information on installing Kettle can be found at http://wiki.pentaho.com/display/EAI/01.+Installing+Kettle.

#####Launching Kettle Open the data-integration folder and depending on your operating system click on either the Spoon.bat file in Windows or Data Integration.app file on Mac. For Linux execute spoon.sh on Unix-like operating systems. More detailed documentation on launching Kettle here.

Windows Mac

#####Configuring data extraction Once Kettle opens, you will be able to create a new transformation by selecting File > New > Transformation. You will then be able to connect to your data source by expanding the Input tab under the Design Window. You will want to use the Table Input if you are reading from a database. Other options for Inputs include Microsoft Access Input and Excel Input. All Kettle input types are listed here (only those labeled with ‘Input’ as the category).

#####Saving data to a file Expand the Output tab under the Design Window and drag a CSV onto the main window and click on it.

First, you will need to change the extension to CSV and provide a name for the Filename

Then click on the Content tab and change the Separator to a comma.

The final step is to go to the Fields tab and click on the Get Fields button for it populate all of your data fields. You can right-click on any of the fields to remove any that you do not wish to send to Socrata. (An alternate way to choose and rename fields is to add a Select Values transformer before the creating the CSV file)

####Option 3: Safe FME FME (Desktop version) is another ETL tool similar to Pentaho that will enable extracting and transforming data from an even greater number of source systems and file formats. However it does a require purchasing a license after 30 day trial period has expired (FME desktop pricing).

#####Installation You can get a free 30-day trial license at http://www.safe.com/solutions/for-applications/socrata/trial and then download the latest version of FME Desktop for your operating system at http://www.safe.com/support/support-resources/fme-downloads. Double click on the downloaded file and follow the installation wizard instructions. You may need to check the Spam folder in your e-mail program to retrieve the Safe FME license.

#####Launching FME FME will be found under programs/applications. Launch it by double-clicking on FME Workbench.

Windows Mac

#####Configuring data extraction and export to file Once FME Workbench opens, you will be able to create a new transformation by selecting File > New > Workspace. Choose the default Generate workspace option to get started.

First you will select a Reader that can access your raw data. FME supports hundreds of data sources, http://www.safe.com/fme/format-search.

Next you will select CSV for your Writer format and then click on OK button.

FME will then add the Reader and Writer to the workflow where you are able to add Transformers in case there are any changes you need to make to the data before sending Socrata your CSV file. Once you are ready to run the workflow, just click on the green Play button.

Clone this wiki locally