-
Notifications
You must be signed in to change notification settings - Fork 5
Your Own Extraction
To write an extraction module for a new XML language, copy one of the existing ones. Then read and understand the documentation of the relevant features of generic/generic.xsl.
Krextor offers two different approaches to translating XML to RDF:
- a simple declarative mapping
- of XML elements to RDF resources, and
- of XML child elements or attributes to RDF properties
- a more complex approach that gives you full control but requires implementing one XSLT template per translation rule. However, Krextor facilitates implementing these templates by a large supply of convenience templates and functions.
Both approaches can be combined; in fact most existing extraction modules do so. They use declarative mappings wherever possible, and resort to custom templates wherever the mapping is more complex.
We first document the two different approaches and then give a simple example below.
Please check the XBEL extraction module as an example for doing simple mappings.
Mappings are defined as follows; each variable is defined in generic/generic.xsl and documented here:
- from XML elements to RDF resources: krextor:resources variable
- from XML child elements or attributes to literal-valued RDF properties (known as “datatype properties” in OWL): krextor:literal-properties variable
- from XML child elements or attributes to URI-valued RDF properties (known as “object properties” in OWL): krextor:uri-properties variable
Once defined, they need to be activated. You need XSLT templates (in krextor:main mode) that match each occurrence of an XML element or attribute to which one of these mappings should be applied. Usually it suffices to have one template per type of mappings, i.e. one that matches all XML constructs that should be mapped to resources, one that matches all XML constructs to be mapped to literal-valued properties, etc.
The following named templates, which are defined in generic/generic.xsl and documented here, give full control over the translation:
-
create-resource
Creates an RDF resource of some type from this element, and probably creates related triples having this resource as a subject or object. Resources can be identified by auto-generated URIs, by custom URIs (see below), or by blank node IDs. Then, matching extraction templates are applied to the child elements. A call to create-resource defines a scope in which the created resource is the default subject of any other triple created using these templates, unless another resource is created from some child element. -
add-literal-property
Adds a literal-valued property to the resource in whose create-resource scope this template was called. -
add-uri-property
Adds a URI-valued property to the resource R in whose create-resource scope this template was called. R can either be subject (default) or object (“inverse mode”) of the resulting triple.
These templates make typical RDF extraction tasks much easier than directly creating output triples using the output-triple function in the respective output module. See the source code for additional documentation.
As a rule of thumb, we recommend that every XML element or attribute that corresponds to one RDF resource or property be matched by one template in the krextor:main mode that calls one of the templates above. This ensures the most effective default behaviour without requiring you to write much code yourself. More particularly:
- If an element E corresponds to a resource, write a template in the krextor:main mode that matches E and calls create-resource.
- If an element E contains text that corresponds to the value of a property of a resource represented by a parent or ancestor element, write a template in the krextor:main mode that matches E and calls add-literal-property or add-uri-property.
- If an attribute A contains text that corresponds to the value of a property of a resource represented by its parent element E, write a template in the krextor:main mode that matches A and calls add-literal-property or add-uri-property. create-resource, as called from a template matching E, ensures that templates are applied to attributes, too – which is not generally the case in XSLT.
- If an element E corresponds to a resource that is related to a resource represented by a parent or ancestor element P via a property P–prop–E (e.g. a part-of relationship), match E as in case (1), but pass the parameter related-via-properties with value ‘prop’. This is designed as a (comma-separated) sequence because sometimes there can be more than one such property, e.g. one logical relation and one document structure relation.
A template that traverses XInclude links is provided in generic/generic.xsl and active by default. Templates are applied to nodes in XIncluded documents in the krextor:included mode. A rule of thumb is not to recursively create resources from XIncluded elements, but to process these documents separately and then merge the resulting RDF graphs in your application outside of Krextor. It is recommended to just generate part-of relationships from the including document to the resource represented by the root element of the included document, this time using add-uri-property and its default behaviour.
Consider the following XML format for data about social networks:
<person friends="http://van-houten.name/milhouse http://moe.org/#me"> <name>Bart Simpson</name> </person>
Below we provide Krextor XSLT snippets that extract RDF using the FOAF ontology, i.e. (in Turtle serialization):
# see below about URIs <some-uri> a foaf:Person ; foaf:name "Bart Simpson" ; foaf:knows <http://van-houten.name/milhouse>, <http://moe.org/#me> .
The listings below assume that namespaces prefixes and XML entities have been set up correctly.
<!-- declares mappings of XML elements to RDF resources --> <xsl:variable name="krextor:resources"> <person type="&foaf;Person"/> <!-- further element→resource mappings follow --> </xsl:variable><!-- activates these mappings --> <xsl:template match="person" mode="krextor:main"> <!-- further mapped elements would be given as match="element1|element2" --> <xsl:apply-templates select="." mode="krextor:create-resource"/> </xsl:template>
<!-- declares mappings of XML elements and attributes to literal-valued RDF properties --> <xsl:variable name="krextor:literal-properties"> <name property="&foaf;name"/> <!-- ... --> </xsl:variable>
<!-- activates these mappings --> <xsl:template match="person/name (: |... :)" mode="krextor:main"> <xsl:apply-templates select="." mode="krextor:add-literal-property"/> </xsl:template>
<!-- declares mappings of XML attributes to URI-valued RDF properties --> <xsl:variable name="krextor:uri-properties"> <friends property="&foaf;knows" object-is-list="true" krextor:attribute="yes"/> <!-- ... --> </xsl:variable>
<!-- activates these mappings --> <xsl:template match="person/@friends (: |... :)" mode="krextor:main"> <xsl:apply-templates select="." mode="krextor:add-uri-property"/> <!-- ... --> </xsl:template>
Observe the low effort of adding a mapping of an additional XML construct. Also note that, once the necessary variable and template declarations are in place, very little XSLT knowledge is required for adding mappings.
<!-- map person elements to foaf:Person resources --> <template match="person" mode="krextor:main"> <call-template name="krextor:create-resource"> <with-param name="type" select="'&foaf;Person'"/> </call-template> </template><!-- map person/name elements to foaf:name properties --> <template match="person/name" mode="krextor:main"> <call-template name="krextor:add-literal-property"> <with-param name="property" select="'&foaf;name'"/> </call-template> </template>
<!-- map person/@friends attributes to foaf:knows properties --> <template match="person/@friends" mode="krextor:main"> <call-template name="krextor:add-uri-property"> <with-param name="property" select="'&foaf;knows'"/> <with-param name="object-is-list" select="true()"/> </call-template> </template>
Now consider that inside these templates you have access to the full power of XSLT and XPath – but that you have to write one such template per additional mapping, and obviously you need some knowledge of XSLT and XPath.
Krextor offers a reasonable number of possibilities for generating URIs for resources by default; see the documentation of generic/generic.xsl on how to choose from them. If you want to implement your own, you have to give it a name N and to implement a template matching the element krextor-genuri:N, where we use krextor-genuri as a namespace prefix for the URI http://kwarc.info/projects/krextor/genuri. This template has to accept two parameters, a node and the current base URI, and it has to return one or no string (type xs:string?). The krextor:fragment-uri-or-null function may be helpful, if you actually only want to implement your own way of generating a fragment URI, which is appended to the current base URI, which is the most common case.
See the XMath extraction module for a simple example.