Skip to content

CENMetaLex/metalex-annotator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MetaLex Annotator
Copyright (c) 2011 by Rinke Hoekstra, Universiteit van Amsterdam
====

@author: Rinke Hoekstra
@contact: [email protected]
@organization: Universiteit van Amsterdam
@version: 0.1
@status: beta
@website: http://doc.metalex.eu
@copyright: 2011, Rinke Hoekstra, Universiteit van Amsterdam

@license: MetaLex Annotator is free software, you can redistribute it and/or modify
it under the terms of GNU Affero General Public License
as published by the Free Software Foundation, either version 3
of the License, or (at your option) any later version.

You should have received a copy of the the GNU Affero
General Public License, along with MetaLex Converter. If not, see


Additional permission under the GNU Affero GPL version 3 section 7:

If you modify this Program, or any covered work, by linking or
combining it with other code, such other code is not for that reason
alone subject to any of the requirements of the GNU Affero GPL
version 3.

====
This is a set of Python scripts for:
1. Extracting concepts from Dutch legislative texts in CEN MetaLex
2. Representing these concepts as RDF SKOS concept instances.
3. Annotating these texts with these concepts
4. Linking these concepts to Cornetto Wordnet
5. Identifying definitions in Dutch legislative texts in CEN MetaLex (alpha version)
6. Representing these definitions as RDF (not yet implemented)
7. Uploading the RDF representations to a triple store


====
# How to get things going (Windows users, look also the notes below):

* Install Python 2.7 for your OS 

* Install Python setuptools (for easy_install)

* Install rdflib from http://www.rdflib.net
  - (run 'easy_install -U "rdflib>=3.0.0"')

* Install nltk
  - (run 'easy_install nltk')

* Install regex
  - (run 'easy_install regex')

* Install SPARQLWrapper
  - (run 'easy_install sparqlwrapper')
  
* Intialize a triple store, and set the proper URL for its SPARQL endpoint in 'cornetto.linker.py' (currently works with ClioPatria)

* Upload Cornetto Wordnet RDF triples to the triple store (see http://semanticweb.cs.vu.nl/europeana/home)
  
* Initialize a full set and a training and test set
  - Run 'python generate_evaluation_sets.py [path-to-metalex-files]'
  
* Windows users should also modify the lines configuring the locale in annotator.py. Refer to the comments inside the file.

* Run 'python parse.py --full'

====
# Further steps, could be necessary for Windows+MinGW

* Install the Python 32bit version for Windows, because of better compatibility with MinGW compiling regex

* Install numby 
  - (run 'easy_install numby')

* You could have an error installing regex about something like "-mno-cygwin not valid argument"
  - modify $PYTHON_PATH$/Lib/disutils/cygwincompiler.py and remove -mno-cygwin from the commands placed in the class Mingw32CCompiler

* Set the locale settings to Windows standards
  - modify parse.py following the comments placed in def write_concept_scores

About

MetaLex concept extraction and definition recognition toolkit

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published