-
Notifications
You must be signed in to change notification settings - Fork 12
How to publish a txt corpora with NIF as Linked Data
Sebastian Hellmann edited this page Aug 26, 2013
·
3 revisions
We assume that you have a whole lot of txt files, which you want to annotate and publish as Linked Data: http://www.grammararchive.org/txt/
Each text has its own URI, e.g. starting with http://www.grammararchive.org/resource/abbadie_kam1872
curl -IL "Accept: plain/text" http://www.grammararchive.org/resource/abbadie_kam1872
HTTP/1.1 303 See Other
Location: http://www.grammararchive.org/txt/abbadie_kam1872.txt
curl -I "Accept: text/turtle" http://www.grammararchive.org/resource/abbadie_kam1872
HTTP/1.1 303 See Other
Location: http://www.grammararchive.org/rdf/abbadie_kam1872.ttl
curl http://www.grammararchive.org/rdf/abbadie_kam1872.ttl
should return something like:
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
<http://www.grammararchive.org/resource/abbadie_kam1872#char=0,9115>
rdf:type nif:RFC5147String , nif:Context ;
nif:beginIndex "0" ;
nif:endIndex "9115" ;
# add all chars as object
nif:isString """- 66
pommelés de la Celtibérie changeaient de robe quand ils étaient transportés .................<- 9115 chars total""" ;
# optionally link to the sourcefile:
nif:sourceUrl "http://www.grammararchive.org/txt/abbadie_kam1872.txt" .
# just the page number
<http://www.grammararchive.org/resource/abbadie_kam1872#char=2,4>
a rdf:type nif:RFC5147String .
nif:beginIndex "2" ;
nif:endIndex "4" ;
nif:referenceContext <http://www.grammararchive.org/resource/abbadie_kam1872#char=0,9115> ;
# add your own annotations here, feel free to use whatever e.g.
myvocab:PageNumber ;
myvocab:pn "true" ;
myvocab:number "66"^^xsd:integer
Code: https://github.com/NLP2RDF/software/blob/master/php/nif-ws.php Parameter docu: http://persistence.uni-leipzig.org/nlp2rdf/specification/api.html Deployment (off the shelf): http://nlp2rdf.lod2.eu/nif-ws.php
Please consider deploying the code locally to save traffic.
curl -H "Accept: text/turtle" --data-urlencode input@abbadie_kam1872.txt
"http://nlp2rdf.lod2.eu/nif-ws.php?informat=text" > abbadie_kam1872.ttl
https://github.com/NLP2RDF/software#nif-validator
sudo a2enmod rewrite
sudo service apache2 restart
Options -MultiViews
AddType application/rdf+xml .rdf .owl
AddType text/plain .ttl
RewriteEngine On
AddCharset utf-8 .txt .log .ttl
##################
# Rewrite rule to serve text/plain content if requested
##################
RewriteCond %{HTTP_ACCEPT} text/plain
RewriteRule ^resource/(.*)$ /txt/$1.txt [R=303,L]
RewriteCond %{HTTP_ACCEPT} application/rdf+xml
RewriteRule ^resource/(.*)$ /rdf/$1.rdf [R=303,L]
#################
# Default
#################
RewriteRule ^resource/(.*)$ /rdf/$1.ttl [R=303,L]