Skip to content
Vladimir Alexiev edited this page Aug 10, 2015 · 3 revisions

Playing around with RDF and Sparql

  • Exported data from graph using new "export-rdf" command.
  • Fixed array properties by inserting a blank node instead of the missing stuff
  • Imported data into OpenRDF Workbench
  • Without OWLIM backend there doesn't seem to be a way to get inference working. Ideally, we would add something like below and OWL inference would create lots of virtual triples for us (at least as far as I understand it, which isn't very far):
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix tp: <http://tinkerpop.com/pgm/ontology#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix vertex: <http://tinkerpop.com/pgm/vertex/> .
@prefix property: <http://tinkerpop.com/pgm/property/> .
@prefix relation: <http://tinkerpop.com/pgm/relation/> .
@prefix ehri: <http://data.ehri-project.eu/> .

ehri:DocumentaryUnit a owl:Class ;
    rdfs:subClassOf tp:Vertex ;
    rdfs:subClassOf [
        owl:Restriction [
            owl:onProperty property:__ISA__ ;
            owl:hasValue "documentaryUnit" 
        ]
    ] .

ehri:Repository a owl:Class ;
    rdfs:subClassOf tp:Vertex ;
    rdfs:subClassOf [
        owl:Restriction [
            owl:onProperty property:__ISA__ ;
            owl:hasValue "repository" 
        ]
    ] .

But this doesn't seem to work... ahem. If anyone knows what I'm doing wrong let me know.

Vladimir: the above says that whenever something is Repository, it's also a Vertex and that awful Restriction. It doesn't say that a Vertex satisfying the Restriction should be inferred to be a Repository.

Doing it with CONSTRUCT

As it is, we have to construct these triples manually. The only way I could find to do this in OpenRDF Workbench was to run a CONSTRUCT query, download the results (as Turtle, or whatever) and then use the add command to import them into the Workbench (there's probably a better way of doing this.) The CONSTRUCT command was:

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX relation:<http://tinkerpop.com/pgm/relation/>
PREFIX vertex:<http://tinkerpop.com/pgm/vertex/>
PREFIX tp:<http://tinkerpop.com/pgm/ontology#>
PREFIX ehri:<http://data.ehri-project.eu/>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX property:<http://tinkerpop.com/pgm/property/>

CONSTRUCT {
    ?d a ehri:DocumentaryUnit .
    ?r a ehri:Repository .
} WHERE {
   ?d a tp:Vertex ;
      property:__ISA__ "documentaryUnit" .

    ?r a tp:Vertex ;
      property:__ISA__ "repository" .
}

Vladimir: A problem with this CONSTRUCT is that it does the Cartesian product of two independent sets of triples. If you have 10 documentaryUnits and 5 repositories, it'll have to handle 5*10=50 rows, which is a lot of unnecessary work. Each DocumentaryUnit triple will be generated 5 times and each Repository triple will be generated 10 times. (Since you can't insert duplicate triples in a repo, that extra work is masked.)

Vladimir: Or consider what will happen if you have 0 repositories: then no DocumentaryUnit triples will be generated at all!

Querying

This allows us to do queries like the following (fetch the English repository name of documentary unit "us-005521-ms-361"):

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX relation:<http://tinkerpop.com/pgm/relation/>
PREFIX vertex:<http://tinkerpop.com/pgm/vertex/>
PREFIX tp:<http://tinkerpop.com/pgm/ontology#>
PREFIX ehri:<http://data.ehri-project.eu/>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX property:<http://tinkerpop.com/pgm/property/>

SELECT DISTINCT ?name WHERE {
   ?doc a ehri:DocumentaryUnit ;
      property:__ID__ "us-005521-ms-361" ;
      relation:heldBy ?repo .
   ?desc relation:describes ?repo ;
         property:languageCode "eng" ;
         property:name ?name .
}

Doing it with SPARQL UPDATE

We can easily INSERT the needed triples in the repo using SPARQL UPDATE. Use the /update endpoint (not the /query or /sparql endpoint), and that's usually login-protected.

INSERT {?d a ehri:DocumentaryUnit} 
  WHERE {?d a tp:Vertex; property:__ISA__ "documentaryUnit"};
INSERT {?r a ehri:Repository}
  WHERE {?r a tp:Vertex ; property:__ISA__ "repository"}

Doing it with Ontotext GraphDB Rules

Vladimir: Ontotext GraphDB (formerly OWLIM) uses a simple rule language that allows such kind of inferencing.

Rules
{
Id:DocumentaryUnit
  d <rdf:type> <tp:Vertex>
  d <property:__ISA__> "documentaryUnit"
  --------------------------------------
  d <rdf:type> <ehri:DocumentaryUnit>

Id:Repository
  d <rdf:type> <tp:Vertex>
  d <property:__ISA__> "repository"
  --------------------------------------
  d <rdf:type> <ehri:Repository>
}

All reasoning supported by GraphDB (eg RDFS, OWL-Horst, OWL QL, OWL RL) is implemented with such rules, but you can also use custom rule sets (.PIE) The benefit of using inferencing is incremental assert and retract: the repo takes care to infer or retract all consequences on insert/delete of basic triples, no matter in which order they are inserted.

It would be better to abstract a bit and use a single rule for all such inferencing:

Rules
{
Id:__ISA__to_type
  x <rdf:type> <tp:Vertex>
  x <property:__ISA__> isa
  t <ehri:correspondsToISA> isa
  --------------------------------------
  x <rdf:type> t
}

The above is a "parametric rule" that will be fired if we have these ontology (T-Box) triples in the repository:

ehri:DocumentaryUnit ehri:correspondsToISA "documentaryUnit".
ehri:Repository      ehri:correspondsToISA "repository".

Not doing it at all

Vladimir: IMHO the best way to do such kind of equilibristics is not to do them at all.

  • Use Ontotext GraphDB natively in EHRI to ensure high performance on large amounts of RDF
  • The GraphDB team is working on a Blueprints implementation, so we can provide a Blueprints API to the rest of the system, if needed