-
Notifications
You must be signed in to change notification settings - Fork 36
Script: pvload.sh
- vload - a "provenance-free" shell script wrapper to Virtuoso's isql-vt.
- Naming sparql service description's sd:NamedGraph, so we can name a SPARQL endpoints' named graph.
- Named graphs that know where they came from, talks about provenance modeling of named graphs.
This page describes how to use pvload.sh to capture provenance of loading SPARQL triple store named graphs.
$ pvload.sh --help
usage: pvload.sh [--help] [-n] url [-ng named_graph]
-n : dry run - do not download or load into named graph.
url : the URL to retrieve and load into a named graph.
-ng : the named graph to place 'url'. (if not provided, -ng == 'url').
(Setting envvar CSV2RDF4LOD_CONVERT_DEBUG_LEVEL=finest will leave temporary files after invocation.)
- CSV2RDF4LOD_BASE_URI is used to create URIs for instances of provenance.
- CSV2RDF4LOD_PUBLISH_VIRTUOSO_SPARQL_ENDPOINT is the forward-facing URL for the SPARQL endpoint, e.g. http://opendap.tw.rpi.edu/sparql.
$ pvload.sh http://provenanceweb.org/source/same.ttl
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
--> (Named Graph) http://provenanceweb.org/source/same.ttl
--> (PROV Graph) http://provenanceweb.org/source/same.ttl
Let's load one triple into the graph named http://example.org/pvload-test:
$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-2
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
--> (Named Graph) http://example.org/pvload-test-2
--> (PROV Graph) http://example.org/pvload-test-2
When this is done on opendap.tw.rpi.edu, a summary of the named graph can be found at http://opendap.tw.rpi.edu/graph/http/example.org/pvload-test. Because the graph that we loaded only had 1 triple, and the named graph ends up with 128, pvload.sh added 127 triples of provenance.
$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-3 --separate-provenance
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
--> (Named Graph) http://example.org/pvload-test-3
--> (PROV Graph) http://provenanceweb.org/graph-prov/example.org/pvload-test-3
results in one triple from:
select distinct count(*)
where {
graph <http://example.org/pvload-test-4> {?s ?p ?o}
}
and 129 triples from:
select distinct count(*)
where {
graph <http://provenanceweb.org/graph-prov/example.org/pvload-test-3> {?s ?p ?o}
}
Adding the --into <prov_graph> argument lets you control which graph to put the provenance into.
$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-4 --separate-provenance --into http://example.org/put-my-provenance-here
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
--> (Named Graph) http://example.org/pvload-test-4
--> (PROV Graph) http://example.org/put-my-provenance-here
results in one triple from:
select distinct count(*)
where {
graph <http://example.org/pvload-test-4> {?s ?p ?o}
}
and 129 triples from:
select distinct count(*)
where {
graph <http://example.org/put-my-provenance-here> {?s ?p ?o}
}
If you don't want to specify the name of the separate provenance graph, use the keyword one and the path /graph-prov will be used.
$ pvload.sh http://provenanceweb.org/source/same.ttl -ng http://example.org/pvload-test-5 --separate-provenance --into one
INFO: pvload.sh: (URL) http://provenanceweb.org/source/same.ttl
--> (Named Graph) http://example.org/pvload-test-5
--> (PROV Graph) http://provenanceweb.org/graph-prov
results in one triple from:
select distinct count(*)
where {
graph <http://example.org/pvload-test-5> {?s ?p ?o}
}
and 129 triples from:
select distinct count(*)
where {
graph <http://provenanceweb.org/graph-prov> {?s ?p ?o}
}
- Script: cache-queries.sh can be used to capture the provenance of querying a SPARQL endpoint.