Skip to content

Generating a sample conversion using only a subset of data

timrdf edited this page Jan 28, 2011 · 27 revisions

When developing enhancement parameters, it is helpful to see the results as they are added. This iterative process can be sped up by only converting a portion of a large CSV. Since a sample subset is already created as part of the conversion,

~/Desktop/source/fludb-org/animal-surveillance/version/2010-Nov-30
bash-3.2$ l automatic/a*
-rw-r--r--  1 lebot  staff      18904 Dec 16 17:33 automatic/avian.txt.csv.raw.void.ttl
-rw-r--r--  1 lebot  staff  158321259 Dec 16 17:33 automatic/avian.txt.csv.raw.ttl
-rw-r--r--  1 lebot  staff      44692 Dec 16 17:32 **automatic/avian.txt.csv.raw.sample.ttl**
-rw-r--r--  1 lebot  staff        776 Dec 16 17:31 automatic/avian.txt.csv.raw.params.ttl

all that we need to do is turn off the "full" conversion using the CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY environment variable.

First, check to see what its current value is:

bash-3.2$ cr-vars.sh 
--
CSV2RDF4LOD_HOME                                         ~/Desktop/csv2rdf4lod-automation
CSV2RDF4LOD_BASE_URI                                     http://logd.tw.rpi.edu
CSV2RDF4LOD_BASE_URI_OVERRIDE                            (not required, $CSV2RDF4LOD_BASE_URI will be used.)
--
CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS                  (will default to: 2)
CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY                  false

Then turn on the "subset only" feature:

bash-3.2$ export CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY="true"

.sample.ttl

For a description of the difference among samples and examples, see Examples versus Samples.

Clone this wiki locally