Script: pvload.sh

What is first

vload - a "provenance-free" shell script wrapper to Virtuoso's isql-vt.
Naming sparql service description's sd:NamedGraph, so we can name a SPARQL endpoints' named graph.
Named graphs that know where they came from, talks about provenance modeling of named graphs.

What we will cover

This page describes how to use pvload.sh to capture provenance of loading SPARQL triple store named graphs.

Let's get to it!

Usage

$ pvload.sh --help
usage: pvload.sh [--help] [-n] url [-ng named_graph]
  -n  : dry run - do not download or load into named graph.
  url : the URL to retrieve and load into a named graph.
  -ng : the named graph to place 'url'. (if not provided, -ng == 'url').

  (Setting envvar CSV2RDF4LOD_CONVERT_DEBUG_LEVEL=finest will leave temporary files after invocation.)

Environment variables that matter

CSV2RDF4LOD_BASE_URI is used to create URIs for instances of provenance.
CSV2RDF4LOD_PUBLISH_VIRTUOSO_SPARQL_ENDPOINT is the forward-facing URL for the SPARQL endpoint, e.g. http://opendap.tw.rpi.edu/sparql.

Loading a URL into a graph with the same name

$ pvload.sh http://provenanceweb.org/source/same.ttl
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
                   --> (Named Graph) http://provenanceweb.org/source/same.ttl
                   --> (PROV Graph)  http://provenanceweb.org/source/same.ttl

Loading a URL into a graph with a different name

Let's load one triple into the graph named http://example.org/pvload-test:

$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-2
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
                   --> (Named Graph) http://example.org/pvload-test-2
                   --> (PROV Graph)  http://example.org/pvload-test-2

When this is done on opendap.tw.rpi.edu, a summary of the named graph can be found at http://opendap.tw.rpi.edu/graph/http/example.org/pvload-test. Because the graph that we loaded only had 1 triple, and the named graph ends up with 128, pvload.sh added 127 triples of provenance.

Loading the provenance of the load into a separate named graph, specific to the graph loaded

$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-3 --separate-provenance
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
                   --> (Named Graph) http://example.org/pvload-test-3
                   --> (PROV Graph)  http://provenanceweb.org/graph-prov/example.org/pvload-test-3

results in one triple from:

select distinct count(*)
where { 
  graph <http://example.org/pvload-test-4> {?s ?p ?o}
}

and 129 triples from:

select distinct count(*)
where { 
  graph <http://provenanceweb.org/graph-prov/example.org/pvload-test-3> {?s ?p ?o}
}

Loading the provenance of the load into a separate named graph, with a different name

Adding the --into <prov_graph> argument lets you control which graph to put the provenance into.

$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-4 --separate-provenance --into http://example.org/put-my-provenance-here
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
                   --> (Named Graph) http://example.org/pvload-test-4
                   --> (PROV Graph)  http://example.org/put-my-provenance-here

results in one triple from:

select distinct count(*)
where { 
  graph <http://example.org/pvload-test-4> {?s ?p ?o}
}

and 129 triples from:

select distinct count(*)
where { 
  graph <http://example.org/put-my-provenance-here> {?s ?p ?o}
}

Loading the provenance of the load into a separate, shared, named graph

If you don't want to specify the name of the separate provenance graph, use the keyword one and the path /graph-prov will be used.

$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-5 --separate-provenance --into one
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
                   --> (Named Graph) http://example.org/pvload-test-5
                   --> (PROV Graph)  http://provenanceweb.org/graph-prov

results in one triple from:

select distinct count(*)
where { 
  graph <http://example.org/pvload-test-5> {?s ?p ?o}
}

and 129 triples from:

select distinct count(*)
where { 
  graph <http://provenanceweb.org/graph-prov> {?s ?p ?o}
}

What is next

Script: cache-queries.sh can be used to capture the provenance of querying a SPARQL endpoint.

Script: pvload.sh

What is first

What we will cover

Let's get to it!

Usage

Environment variables that matter

Loading a URL into a graph with the same name

Loading a URL into a graph with a different name

Loading the provenance of the load into a separate named graph, specific to the graph loaded

Loading the provenance of the load into a separate named graph, with a different name

Loading the provenance of the load into a separate, shared, named graph

What is next

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!