Skip to content

CSV2RDF4LOD environment variables

timrdf edited this page Oct 18, 2011 · 41 revisions

csv2rdf4lod-automation is a set of shell scripts that support the retrieval, organization, conversion, and publishing of tabular data. csv2rdf4lod-automation invokes csv2rdf4lod, a Java jar implementing the conversion vocabulary, which specifies declarative enhancements that can be applied to tabular literals to create well-structured, highly-connected RDF representations.

When using csv2rdf4lod-automation in a unix shell, the scripts refer to a variety of CSV2RDF4LOD_ environment variables to determine what processing they should or should not do, or how they should be doing it.

Further considerations for distributed environments need to be made if you are adopting csv2rdf4lod-automation as part of a team project spread across machines via version control.

Documentation for each environment variable

The most authoritative documentation for each of these environment variables is in the commenting of $CSV2RDF4LOD_HOME/bin/setup.sh. Although setup.sh should not be edited because it is the template for your very own my-csv2rdf4lod-source-me.sh when you [install csv2rdf4lod-automation](Installing csv2rdf4lod automation), you can edit the my-csv2rdf4lod-source-me.sh to suit your system.

Again, the values in $CSV2RDF4LOD_HOME/bin/setup.sh DO NOT INFLUENCE the automation -- only those in my-csv2rdf4lod-source-me.sh do. Edit my-csv2rdf4lod-source-me.sh and not $CSV2RDF4LOD_HOME/bin/setup.sh.

$CSV2RDF4LOD_HOME/install.sh uses $CSV2RDF4LOD_HOME/bin/setup.sh to create your my-csv2rdf4lod-source-me.sh for your system.

You edit my-csv2rdf4lod-source-me.sh; Documentation is in $CSV2RDF4LOD_HOME/bin/setup.sh.

Invoking cr-vars.sh will show all variables and either their current value or a comment about what the value will default to:

bash-3.2$ cr-vars.sh
--
CSV2RDF4LOD_HOME                                         /Users/timrdf/Desktop/csv2rdf4lod-automation
CSV2RDF4LOD_BASE_URI                                     http://logd.tw.rpi.edu
CSV2RDF4LOD_BASE_URI_OVERRIDE                            (not required, $CSV2RDF4LOD_BASE_URI will be used.)
--
CSV2RDF4LOD_CONVERT_MACHINE_URI                          http://tw.rpi.edu/web/inside/machine/lebot_macbook#
CSV2RDF4LOD_CONVERT_PERSON_URI                           http://tw.rpi.edu/instances/TimLebo
--
CSV2RDF4LOD_CONVERT_NUMBER_EXAMPLE_ROWS                  (will default to: 2)
CSV2RDF4LOD_CONVERT_EXAMPLE_SUBSET_ONLY                  false
CSV2RDF4LOD_CONVERT_DUMP_FILE_EXTENSIONS                 ttl.tgz,nt
CSV2RDF4LOD_CONVERT_PROVENANCE_GRANULAR                  (will default to: false)
--
CSV2RDF4LOD_PUBLISH                                      (will default to: true)
CSV2RDF4LOD_PUBLISH_DELAY_UNTIL_ENHANCED                 true
CSV2RDF4LOD_PUBLISH_COMPRESS                             (will default to: false)
CSV2RDF4LOD_PUBLISH_OUR_SOURCE_ID                        (will not archive conversion metadata into versioned dataset.)
CSV2RDF4LOD_PUBLISH_OUR_DATASET_ID                       (will not archive conversion metadata into versioned dataset.)
CSV2RDF4LOD_PUBLISH_TTL                                  true
CSV2RDF4LOD_PUBLISH_TTL_LAYERS                           true
CSV2RDF4LOD_PUBLISH_NT                                   false
CSV2RDF4LOD_PUBLISH_RDFXML                               false
--
CSV2RDF4LOD_PUBLISH_SUBSET_VOID                          true
CSV2RDF4LOD_PUBLISH_SUBSET_VOID_NAMED_GRAPH              (will default to: auto)
CSV2RDF4LOD_PUBLISH_SUBSET_SAMEAS                        true
CSV2RDF4LOD_PUBLISH_SUBSET_SAMEAS_NAMED_GRAPH            (will default to: auto)
CSV2RDF4LOD_PUBLISH_SUBSET_SAMPLES                       false
--
CSV2RDF4LOD_PUBLISH_CONVERSION_PARAMS_NAMED_GRAPH        (will default to: auto)
--
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION                  false
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_WWW_ROOT         (will default to: VVV/publish/lod-mat/)
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_WRITE_FREQUENCY  (will default to: 1,000,000)
CSV2RDF4LOD_PUBLISH_LOD_MATERIALIZATION_REPORT_FREQUENCY (will default to: 1,000)
CSV2RDF4LOD_CONCURRENCY                                  2
--
CSV2RDF4LOD_PUBLISH_TDB                                  false
CSV2RDF4LOD_PUBLISH_TDB_DIR                              (will default to: VVV/publish/tdb/)
CSV2RDF4LOD_PUBLISH_TDB_INDIV                            false
--
CSV2RDF4LOD_PUBLISH_4STORE                               false
CSV2RDF4LOD_PUBLISH_4STORE_KB                            (will default to: csv2rdf4lod -- /var/lib/4store/csv2rdf4lod)
--
CSV2RDF4LOD_PUBLISH_VIRTUOSO                             (will default to: false)
--
CSV2RDF4LOD_PUBLISH_SPARQL_ENDPOINT                      (will default to: none)
CSV2RDF4LOD_PUBLISH_SPARQL_RESULTS_DIRECTORY             (will default to: none)
--
see documentation for variables in:
/Users/timrdf/Desktop/csv2rdf4lod-automation/bin/setup.sh

CSV2RDF4LOD not set

Variables also discussed on the wiki

Some variables are discussed beyond the authoritative comments in $CSV2RDF4LOD_HOME/bin/setup.sh:

See also

Misc.

http://code.google.com/p/data-gov-wiki/issues/detail?id=47

Clone this wiki locally