RDF Sparql tests

Playing around with RDF and Sparql

Exported data from graph using new "export-rdf" command.
Fixed array properties by inserting a blank node instead of the missing stuff
Imported data into OpenRDF Workbench
Without OWLIM backend there doesn't seem to be a way to get inference working. Ideally, we would add something like below and OWL inference would create lots of virtual triples for us (at least as far as I understand it, which isn't very far):

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix tp: <http://tinkerpop.com/pgm/ontology#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix vertex: <http://tinkerpop.com/pgm/vertex/> .
@prefix property: <http://tinkerpop.com/pgm/property/> .
@prefix relation: <http://tinkerpop.com/pgm/relation/> .
@prefix ehri: <http://data.ehri-project.eu/> .

ehri:DocumentaryUnit a owl:Class ;
    rdfs:subClassOf tp:Vertex ;
    rdfs:subClassOf [
        owl:Restriction [
            owl:onProperty property:__ISA__ ;
            owl:hasValue "documentaryUnit" 
        ]
    ] .

ehri:Repository a owl:Class ;
    rdfs:subClassOf tp:Vertex ;
    rdfs:subClassOf [
        owl:Restriction [
            owl:onProperty property:__ISA__ ;
            owl:hasValue "repository" 
        ]
    ] .

But this doesn't seem to work... ahem. If anyone knows what I'm doing wrong let me know.

Vladimir: the above says that whenever something is Repository, it's also a Vertex and that awful Restriction. It doesn't say that a Vertex satisfying the Restriction should be inferred to be a Repository.

Doing it with CONSTRUCT

As it is, we have to construct these triples manually. The only way I could find to do this in OpenRDF Workbench was to run a CONSTRUCT query, download the results (as Turtle, or whatever) and then use the add command to import them into the Workbench (there's probably a better way of doing this.) The CONSTRUCT command was:

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX relation:<http://tinkerpop.com/pgm/relation/>
PREFIX vertex:<http://tinkerpop.com/pgm/vertex/>
PREFIX tp:<http://tinkerpop.com/pgm/ontology#>
PREFIX ehri:<http://data.ehri-project.eu/>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX property:<http://tinkerpop.com/pgm/property/>

CONSTRUCT {
    ?d a ehri:DocumentaryUnit .
    ?r a ehri:Repository .
} WHERE {
   ?d a tp:Vertex ;
      property:__ISA__ "documentaryUnit" .

    ?r a tp:Vertex ;
      property:__ISA__ "repository" .
}

Vladimir: A problem with this CONSTRUCT is that it does the Cartesian product of two independent sets of triples. If you have 10 documentaryUnits and 5 repositories, it'll have to handle 5*10=50 rows, which is a lot of unnecessary work. Each DocumentaryUnit triple will be generated 5 times and each Repository triple will be generated 10 times. (Since you can't insert duplicate triples in a repo, that extra work is masked.)

Vladimir: Or consider what will happen if you have 0 repositories: then no DocumentaryUnit triples will be generated at all!

Querying

This allows us to do queries like the following (fetch the English repository name of documentary unit "us-005521-ms-361"):

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX relation:<http://tinkerpop.com/pgm/relation/>
PREFIX vertex:<http://tinkerpop.com/pgm/vertex/>
PREFIX tp:<http://tinkerpop.com/pgm/ontology#>
PREFIX ehri:<http://data.ehri-project.eu/>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX property:<http://tinkerpop.com/pgm/property/>

SELECT DISTINCT ?name WHERE {
   ?doc a ehri:DocumentaryUnit ;
      property:__ID__ "us-005521-ms-361" ;
      relation:heldBy ?repo .
   ?desc relation:describes ?repo ;
         property:languageCode "eng" ;
         property:name ?name .
}

Doing it with SPARQL UPDATE

We can easily INSERT the needed triples in the repo using SPARQL UPDATE. Use the /update endpoint (not the /query or /sparql endpoint), and that's usually login-protected.

INSERT {?d a ehri:DocumentaryUnit} 
  WHERE {?d a tp:Vertex; property:__ISA__ "documentaryUnit"};
INSERT {?r a ehri:Repository}
  WHERE {?r a tp:Vertex ; property:__ISA__ "repository"}

Doing it with Ontotext GraphDB Rules

Vladimir: Ontotext GraphDB (formerly OWLIM) uses a simple rule language that allows such kind of inferencing.

Rules
{
Id:DocumentaryUnit
  d <rdf:type> <tp:Vertex>
  d <property:__ISA__> "documentaryUnit"
  --------------------------------------
  d <rdf:type> <ehri:DocumentaryUnit>

Id:Repository
  d <rdf:type> <tp:Vertex>
  d <property:__ISA__> "repository"
  --------------------------------------
  d <rdf:type> <ehri:Repository>
}

All reasoning supported by GraphDB (eg RDFS, OWL-Horst, OWL QL, OWL RL) is implemented with such rules, but you can also use custom rule sets (.PIE) The benefit of using inferencing is incremental assert and retract: the repo takes care to infer or retract all consequences on insert/delete of basic triples, no matter in which order they are inserted.

It would be better to abstract a bit and use a single rule for all such inferencing:

Rules
{
Id:__ISA__to_type
  x <rdf:type> <tp:Vertex>
  x <property:__ISA__> isa
  t <ehri:correspondsToISA> isa
  --------------------------------------
  x <rdf:type> t
}

The above is a "parametric rule" that will be fired if we have these ontology (T-Box) triples in the repository:

ehri:DocumentaryUnit ehri:correspondsToISA "documentaryUnit".
ehri:Repository      ehri:correspondsToISA "repository".

Not doing it at all

Vladimir: IMHO the best way to do such kind of equilibristics is not to do them at all.

Use Ontotext GraphDB natively in EHRI to ensure high performance on large amounts of RDF
The GraphDB team is working on a Blueprints implementation, so we can provide a Blueprints API to the rest of the system, if needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly