Skip to content

Generating a sample conversion using only a subset of data

timrdf edited this page Jan 28, 2011 · 27 revisions

When developing enhancement parameters, it is helpful to see the results as they are added. This iterative process can be sped up by only converting a portion of a large CSV. Since a sample subset is already created as part of the conversion, all that we need to do is turn off the "full" conversion using the CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY environment variable.

First, check to see what its current value is:

bash-3.2$ cr-vars.sh 
--
CSV2RDF4LOD_HOME                                         /Users/lebot/afrl/information_management/m4rker/domain_instances/tw-data-gov/csv2rdf4lod
CSV2RDF4LOD_BASE_URI                                     http://logd.tw.rpi.edu
CSV2RDF4LOD_BASE_URI_OVERRIDE                            (not required, $CSV2RDF4LOD_BASE_URI will be used.)
--
CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS                  (will default to: 2)
CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY                  false

How?

Set:

CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY="yes"

.sample.ttl

TODO: example vs sample. one is explicitly annotated in the enhancement params, the other is just the first N rows. Is it being consistent? Look at java params, conversion: params, and env vars.

Clone this wiki locally