-
Notifications
You must be signed in to change notification settings - Fork 5
Documentation
Krextor, the KWARC RDF Extractor, is an extensible XSLT-based framework for extracting RDF from XML, supporting multiple input languages as well as multiple output RDF notations. Krextor provides convenience templates that try to do “the right thing”™ in many common cases, as to reduce the need for manually writing repetitive code. The Publications provide further background on the design, requirements, and use cases behind Krextor.
The extracted RDF graph will in most cases be an outline of the semantic structure of an XML document, abstracting from the concrete syntax. It can be used for more easily exchanging or interlinking knowledge contained in XML documents on the semantic web. There are many tools that support querying RDF, using languages like SPARQL. If the extracted RDF is backed by an expressive ontology, a reasoner can be used to infer additional knowledge from it.
Krextor comes with some number of extraction and output modules. Support for additional formats is easy to add. Please let us know if you have written any extraction or output module, test case, or documentation that you would like us to make a part of the Krextor default distribution.
The following input formats are already supported. Others are easy to add. Just copy an existing extraction module to get started.
-
omdoc (largely stable): OMDoc ([source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/extract/omdoc.xsl))
- in terms of this ontology (ontology sources)
-
ocd (stable): OpenMath Content Dictionaries ([source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/extract/ocd.xsl))
- in terms of this ontology
- feature overview
-
xbel (stable but incomplete): XBEL (XML Bookmark Exchange Language) ([source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/extract/xbel.xsl))
- in terms of the Shared Desktop Ontologies
- suitable for use with the Nepomuk/KDE semantic desktop
- xhtml-rdfa (experimental): XHTML+RDFa ([source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/extract/xhtml-rdfa.xsl))
- omdoc-owl (experimental): OMDoc, interpreted as OWL ontologies ([source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/extract/omdoc-owl.xsl))
- hcalendar (incomplete): the hCalendar MicroFormat? (experimental; [source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/extract/hcalendar.xsl))
- YourOwnExtraction
- sequence of triples:
- rxr: RXR (Regular XML RDF), schema ([source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/output/rxr.xsl))
- ntriples: N-Triples
- grouped triples (first by common subject, then by common predicate; implemented as post-processing of RXR for now):
- turtle: Turtle ([source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/output/turtle.xsl))
- rdf-xml: RDF/XML ([source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/output/rdf-xml.xsl))
- rdfa: RDFa (for inclusion into XSLTs that render XML to XHTML+RDFa; [source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/output/util/rdfa.xsl))
- java: Java callback for every triple ([source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/output/java.xsl))
- none: no output; for testing ([source](https://github.com/EIS-Bonn/krextor/blob/master/src/xslt/output/none.xsl))
- YourOwnOutput
See Usage
(generated using XSLTdoc)
- [latest version (trunk)](https://github.com/EIS-Bonn/krextor/blob/master/doc/xsltdoc/index.html)
-
A report by Tim Lebo on how he got started with Krextor and hacked it to fit into his application, covering:
- a brief review of the documentation provided here – focusing on those aspects that were relevant to him
- a ready-to-copy XSLT created from the “Simpsons” example on the YourOwnExtraction page
- having extraction modules outside of the [extract](https://github.com/EIS-Bonn/krextor/tree/master/src/xslt/extract) directory (see also #109)
- an extraction module for the NEMSIS XML language for tabular data