-
Notifications
You must be signed in to change notification settings - Fork 12
GATE XML Mapping
This is a step by step tutorial on how to map GATE XML to NIF 2.0 (Not complete yet and still work in progress)
<?xml version="1.0" encoding="UTF-8" ?>
NIF 2.0 assumes Unicode Normal Form C or if necessary NFD, NFKC, NFKD.
<GateDocument>
<!-- The document’s features-->
<GateDocumentFeatures>
<Feature>
<Name className="java.lang.String">MimeType</Name>
<Value className="java.lang.String">text/plain</Value>
</Feature>
<Feature>
<Name className="java.lang.String">gate.SourceURL</Name>
<Value className="java.lang.String">file:/G:/tmp/example.txt</Value>
</Feature>
</GateDocumentFeatures>
<!-- The document content area with serialized nodes -->
<TextWithNodes>
<Node id="0"/>
A TEENAGER
<Node id="11"/>yesterday<Node id="20"/>
accused his parents of cruelty by feeding him a daily diet of chips which
sent his weight ballooning to 22st at the age of l2
<Node id="146"/>.<Node id="147"/>
</TextWithNodes>
Documents in NIF are called "Context" and you have to assign a URI to represent the content of the document. If you use NIF for internal processing and querying the URI does not matter and you could choose: file:/G:/tmp/example.txt#char=0,147
. If you would like to make the URIs retrievable on the Web you are required to use http
URIs and ideally you implement content negotiation as explained on this wiki page and omit the .txt
part of the URI.
file:/G:/tmp/example#char=0,147 rdf:type nif:RFC5147String , nif:Context ;
nif:beginIndex "0" ;
nif:endIndex "147" ;
nif:sourceUrl file:/G:/tmp/example.txt ;
nif:isString "A TEENAGER yesterday accused his parents of cruelty by feeding him a daily diet of chips which sent his weight ballooning to 22st at the age of l2." .
<!-- The default annotation set -->
<AnnotationSet>
<Annotation Type="Date" StartNode="11" EndNode="20">
<Feature>
<Name className="java.lang.String">rule2</Name>
<Value className="java.lang.String">DateOnlyFinal</Value>
</Feature>
<Feature>
<Name className="java.lang.String">rule1</Name>
<Value className="java.lang.String">GazDateWords</Value>
</Feature>
<Feature>
<Name className="java.lang.String">kind</Name>
<Value className="java.lang.String">date</Value>
</Feature>
</Annotation>
<Annotation Type="Sentence" StartNode="0" EndNode="147">
</Annotation>
<Annotation Type="Split" StartNode="146" EndNode="147">
<Feature>
<Name className="java.lang.String">kind</Name>
<Value className="java.lang.String">internal</Value>
</Feature>
</Annotation>
<Annotation Type="Lookup" StartNode="11" EndNode="20">
<Feature>
<Name className="java.lang.String">majorType</Name>
<Value className="java.lang.String">date_key</Value>
</Feature>
</Annotation>
</AnnotationSet>
can be mapped to:
<file:/G:/tmp/example#char=11,20> rdf:type nif:RFC5147String ;
nif:beginIndex "11" ;
nif:endIndex "20" ;
gate:rule2 "DateOnlyFinal" ;
gate:rule1 "GazDateWords" ;
gate:kind "date" .
<file:/G:/tmp/example#char=0,147> rdf:type nif:Sentence .
<file:/G:/tmp/example#char=146,147> rdf:type nif:RFC5147String ;
nif:beginIndex "146" ;
nif:endIndex "147" ;
gate:kind "internal" .
<file:/G:/tmp/example#char=11,20>
gate:majorType "date_key" .
<!-- Named annotation set -->
<AnnotationSet Name="Original markups" >
<Annotation Type="paragraph" StartNode="0" EndNode="147">
</Annotation>
</AnnotationSet>
</GateDocument>
Note that this example only covers the NIF Simple profile, which only allows the default annotation set: file:/G:/tmp/example#char=0,147 a nif:Paragraph . You can also use the NIF Stanbol profile, which is allows alternatives, but is more complex: file:/G:/tmp/example#char=0,147 nif:annotation urn:aaaa-bbbb-3893494 . urn:aaaa-bbbb-3893494 a nif:Paragraph ; gate:annotationSetName "Original markups" .