-
Notifications
You must be signed in to change notification settings - Fork 7
RDF Sparql tests
- Exported data from graph using new "export-rdf" command.
- Fixed array properties by inserting a blank node instead of the missing stuff
- Imported data into OpenRDF Workbench
- Without OWLIM backend there doesn't seem to be a way to get inference working. Ideally, we would add something like below and OWL inference would create lots of virtual triples for us (at least as far as I understand it, which isn't very far):
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix tp: <http://tinkerpop.com/pgm/ontology#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix vertex: <http://tinkerpop.com/pgm/vertex/> .
@prefix property: <http://tinkerpop.com/pgm/property/> .
@prefix relation: <http://tinkerpop.com/pgm/relation/> .
@prefix ehri: <http://data.ehri-project.eu/> .
ehri:DocumentaryUnit a owl:Class ;
rdfs:subClassOf tp:Vertex ;
rdfs:subClassOf [
owl:Restriction [
owl:onProperty property:__ISA__ ;
owl:hasValue "documentaryUnit"
]
] .
ehri:Repository a owl:Class ;
rdfs:subClassOf tp:Vertex ;
rdfs:subClassOf [
owl:Restriction [
owl:onProperty property:__ISA__ ;
owl:hasValue "repository"
]
] .
But this doesn't seem to work... ahem. If anyone knows what I'm doing wrong let me know.
Vladimir: the above says that whenever something is Repository, it's also a Vertex and that awful Restriction. It doesn't say that a Vertex satisfying the Restriction should be inferred to be a Repository.
As it is, we have to construct these triples manually. The only way I could find to do this in OpenRDF Workbench was to run a CONSTRUCT query, download the results (as Turtle, or whatever) and then use the add command to import them into the Workbench (there's probably a better way of doing this.) The CONSTRUCT command was:
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX relation:<http://tinkerpop.com/pgm/relation/>
PREFIX vertex:<http://tinkerpop.com/pgm/vertex/>
PREFIX tp:<http://tinkerpop.com/pgm/ontology#>
PREFIX ehri:<http://data.ehri-project.eu/>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX property:<http://tinkerpop.com/pgm/property/>
CONSTRUCT {
?d a ehri:DocumentaryUnit .
?r a ehri:Repository .
} WHERE {
?d a tp:Vertex ;
property:__ISA__ "documentaryUnit" .
?r a tp:Vertex ;
property:__ISA__ "repository" .
}
Vladimir: A problem with this CONSTRUCT is that it does the Cartesian product of two independent sets of triples. If you have 10 documentaryUnits and 5 repositories, it'll have to handle 5*10=50 rows, which is a lot of unnecessary work. Each DocumentaryUnit triple will be generated 5 times and each Repository triple will be generated 10 times. (Since you can't insert duplicate triples in a repo, that extra work is masked.)
Vladimir: Or consider what will happen if you have 0 repositories: then no DocumentaryUnit triples will be generated at all!
This allows us to do queries like the following (fetch the English repository name of documentary unit "us-005521-ms-361"):
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX relation:<http://tinkerpop.com/pgm/relation/>
PREFIX vertex:<http://tinkerpop.com/pgm/vertex/>
PREFIX tp:<http://tinkerpop.com/pgm/ontology#>
PREFIX ehri:<http://data.ehri-project.eu/>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX property:<http://tinkerpop.com/pgm/property/>
SELECT DISTINCT ?name WHERE {
?doc a ehri:DocumentaryUnit ;
property:__ID__ "us-005521-ms-361" ;
relation:heldBy ?repo .
?desc relation:describes ?repo ;
property:languageCode "eng" ;
property:name ?name .
}
We can easily INSERT the needed triples in the repo using SPARQL UPDATE. Use the /update endpoint (not the /query or /sparql endpoint), and that's usually login-protected.
INSERT {?d a ehri:DocumentaryUnit}
WHERE {?d a tp:Vertex; property:__ISA__ "documentaryUnit"};
INSERT {?r a ehri:Repository}
WHERE {?r a tp:Vertex ; property:__ISA__ "repository"}
Vladimir: Ontotext GraphDB (formerly OWLIM) uses a simple rule language that allows such kind of inferencing.
Rules
{
Id:DocumentaryUnit
d <rdf:type> <tp:Vertex>
d <property:__ISA__> "documentaryUnit"
--------------------------------------
d <rdf:type> <ehri:DocumentaryUnit>
Id:Repository
d <rdf:type> <tp:Vertex>
d <property:__ISA__> "repository"
--------------------------------------
d <rdf:type> <ehri:Repository>
}
All reasoning supported by GraphDB (eg RDFS, OWL-Horst, OWL QL, OWL RL) is implemented with such rules, but you can also use custom rule sets (.PIE) The benefit of using inferencing is incremental assert and retract: the repo takes care to infer or retract all consequences on insert/delete of basic triples, no matter in which order they are inserted.
It would be better to abstract a bit and use a single rule for all such inferencing:
Rules
{
Id:__ISA__to_type
x <rdf:type> <tp:Vertex>
x <property:__ISA__> isa
t <ehri:correspondsToISA> isa
--------------------------------------
x <rdf:type> t
}
The above is a "parametric rule" that will be fired if we have these ontology (T-Box) triples in the repository:
ehri:DocumentaryUnit ehri:correspondsToISA "documentaryUnit".
ehri:Repository ehri:correspondsToISA "repository".
Vladimir: IMHO the best way to do such kind of equilibristics is not to do them at all.
- Use Ontotext GraphDB natively in EHRI to ensure high performance on large amounts of RDF
- The GraphDB team is working on a Blueprints implementation, so we can provide a Blueprints API to the rest of the system, if needed