-
Notifications
You must be signed in to change notification settings - Fork 4
Guidelines or suggestions for data reconciliation (updated from time to time; collecting advice from everyone)
- clarification of name space:
wd:http://www.wikidata.org/entity/
notwd:https://www.wikidata.org/wiki/
wdt:http://www.wikidata.org/prop/direct/
notwdt:https://www.wikidata.org/wiki/Property:
- Don't mix them in using.
- Be cautious of ambiguity of some term:
For example, the "recording" entity of TheSession is not a "recorded music"(Q49017950) but indeed an "album"(Q482994); but in MusicBrainz, "recorded music" and "album" coexist and are different.
Such as
<entity> rdf:type <entity>.
or<entity> wdt:P31 <entity>.
In the future, we may add semantics like
rdf:type owl:equivalentProperty wdt:P31.
.
It can only be done manually.
Some properties[such as name(wdt:P2561), title(wdt:title)] are basically similar to rdfs:label
, which is preferably recommended for the convenience of LLM2SPARQL.
Since the coming out of RDFS, property can be substantially divided into 2 types:
(1)object property:the data type is another item which has URI, for example, see day of week (P2894)
(2)data property:the data type is not URI but rdfs:Literal...
Perhaps, consistency of properties type is recommended. Till now, it's also because a clear distinction between object property and data property will contribute to the accuracy of LLM2SPARQL. For example:
If you ask a question to ChatGPT, it usually render any property as either an object property or a data property. To clarify, you probably have to use
isIRI(?x)
. For example:See a specific question "Find in TheSession performers who are Canadians. And find the recordings they performed in TheSession".
The expected SPARQL can be:
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT distinct ?recording ?performer
WHERE {
GRAPH <http://sample/thesession/reconciled> {
?recording a wd:Q482994 ;
wdt:P175 ?performer .
FILTER isIRI(?performer)# Without the FILTER, it will report "Virtuoso S1TAT Error Query did not complete due to ANYTIME timeout."
}
SERVICE <https://query.wikidata.org/sparql> {
?performer wdt:P27 wd:Q16
}
}
We can also refer to this: https://www.wikidata.org/wiki/Wikidata:WikiProject_Music to get a lot of recommended properties for LinkedMusic.
Especially the context of subject->property->object
The recommended list of schema/ontology sorted in descending order based on priority is: ...
All the entities' format should abide by the ones denoted by namespace prefixes wd
and wdt
@preifx wd:http://www.wikidata.org/entity/
@preifx wdt:http://www.wikidata.org/prop/direct/
Be very careful that it's "http" instead of "https"; for wd
, it's /entity/
instead of /wiki/
...
Or the reconciled URI won't be recognized by Wikidata SPARQL Endpoint.
Especially for those that you have to reconciled manually on OpenRefine, you had better have a spreadsheet to record the mapped entities from Wikidata.
6. For those not easy to be mapped to an exact property or type, we prepare two methods as substitute:
2.1 If necessary, we use hash #(document fragment delimiter). Such as https://musicbrainz.org/doc/Event#Cancelled in MusicBrainz
2.2 If necessary, we can use a fake URL.
Please refer to https://github.com/DDMAL/linkedmusic-datalake/issues/107
- We had better also use owl:sameAs, because it can supplement latent data via an activated reasoning function (in Virtuoso), see:
INSERT {
GRAPH <urn:reason.example> {
<http://InstanceA_local> <http://property> <http://InstanceB_local>.
<http://InstanceA_local> owl:sameAs <http://InstanceA_wiki>.
<http://InstanceB_local> owl:sameAs <http://InstanceB_wiki>.# This reasoning condition doesn't take effect, to be investigated in the future.
}
}
After insertion of data as above, if you check what property http://InstanceA_wiki will have, you may query while activating the reasoning function:
DEFINE input:same-as "yes"
SELECT distinct ?p ?o
FROM <urn:reason.example>
WHERE {
<http://InstanceA_wiki> ?p ?o .
}
The result can be:
p | o |
---|---|
http://www.w3.org/2002/07/owl#sameAs | http://InstanceA_wiki |
http://property | http://InstanceB_local |
p o http://www.w3.org/2002/07/owl#sameAs http://instancea_wiki/ http://property/ http://instanceb_local/