-
Notifications
You must be signed in to change notification settings - Fork 0
MetaLex concept extraction and definition recognition toolkit
License
CENMetaLex/metalex-annotator
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
MetaLex Annotator Copyright (c) 2011 by Rinke Hoekstra, Universiteit van Amsterdam ==== @author: Rinke Hoekstra @contact: [email protected] @organization: Universiteit van Amsterdam @version: 0.1 @status: beta @website: http://doc.metalex.eu @copyright: 2011, Rinke Hoekstra, Universiteit van Amsterdam @license: MetaLex Annotator is free software, you can redistribute it and/or modify it under the terms of GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. You should have received a copy of the the GNU Affero General Public License, along with MetaLex Converter. If not, see Additional permission under the GNU Affero GPL version 3 section 7: If you modify this Program, or any covered work, by linking or combining it with other code, such other code is not for that reason alone subject to any of the requirements of the GNU Affero GPL version 3. ==== This is a set of Python scripts for: 1. Extracting concepts from Dutch legislative texts in CEN MetaLex 2. Representing these concepts as RDF SKOS concept instances. 3. Annotating these texts with these concepts 4. Linking these concepts to Cornetto Wordnet 5. Identifying definitions in Dutch legislative texts in CEN MetaLex (alpha version) 6. Representing these definitions as RDF (not yet implemented) 7. Uploading the RDF representations to a triple store ==== # How to get things going (Windows users, look also the notes below): * Install Python 2.7 for your OS * Install Python setuptools (for easy_install) * Install rdflib from http://www.rdflib.net - (run 'easy_install -U "rdflib>=3.0.0"') * Install nltk - (run 'easy_install nltk') * Install regex - (run 'easy_install regex') * Install SPARQLWrapper - (run 'easy_install sparqlwrapper') * Intialize a triple store, and set the proper URL for its SPARQL endpoint in 'cornetto.linker.py' (currently works with ClioPatria) * Upload Cornetto Wordnet RDF triples to the triple store (see http://semanticweb.cs.vu.nl/europeana/home) * Initialize a full set and a training and test set - Run 'python generate_evaluation_sets.py [path-to-metalex-files]' * Windows users should also modify the lines configuring the locale in annotator.py. Refer to the comments inside the file. * Run 'python parse.py --full' ==== # Further steps, could be necessary for Windows+MinGW * Install the Python 32bit version for Windows, because of better compatibility with MinGW compiling regex * Install numby - (run 'easy_install numby') * You could have an error installing regex about something like "-mno-cygwin not valid argument" - modify $PYTHON_PATH$/Lib/disutils/cygwincompiler.py and remove -mno-cygwin from the commands placed in the class Mingw32CCompiler * Set the locale settings to Windows standards - modify parse.py following the comments placed in def write_concept_scores
About
MetaLex concept extraction and definition recognition toolkit
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published