The Fintan platform is an effort of combining existing converter frameworks with stream-based graph transformation and a workflow management engine in order to create integrated transformation pipelines for various input and output format. It has been developed to address the challenge of Transforming language resources and language data within Task 3.3 of the (Prêt-à-LLOD) project: Research and Innovation Action of the H2020 programme (ERC, grant agreement 825182).
This is the repository used for active Fintan development. The Prêt-à-LLOD deliverable code, with all submodules included, is available from the Prêt-à-LLOD repository. Note that the large number of commits we are behind is mostly due to the inclusion of submodules there.
Clone this repository including sub-modules:
$> git clone https://github.com/acoli-repo/fintan --recurse-submodules
Build the Fintan backend:
$> cd fintan/backend/
$> (. build.sh)
$> cd ../..
Test the Fintan backend:
$> cd fintan/backend/samples/xslt/apertium/
$> . _apertium_demo.sh
Build the Fintan frontend:
$> cd fintan/ui/
$> npm install
$> cd ../..
Run the Fintan frontend
$> (cd fintan/ui/; npm start &)
When the container is running, use your browser to go to web address: http://localhost:3009
The frontend allows you to configure and export Fintan workflows. These can then be executed by the backend.
For more information please refer to the full Software Documentation.
- Frontend for designing Fintan pipelines
- Service for running Fintan pipelines inside integrated Docker containers
- Backend for executing Fintan pipelines on the command line
- includes Core API for stream-based graph processing.
- wraps fully integrated converter components
- Documentation
- external Loader components for various formats, partly compatible, but yet to be fully integrated:
- 9 LLODifier converters for syntax (TIGER/XML, Penn TreeBank format), morphology/glossing (UniMorph, FLEx, Toolbox, Xigt), philological editions (TEI/XML) and transcription formats (ELAN, Exmaralda)
- 11 CoNLL-Merge converters for standard NLP tools (Stanford Core), syntax (Penn TreeBank format, PROIEL format), semantics (PropBank/NomBank,Semafor SRL), coreference (OntoNotes named entity annotations, OntoNotes coreference annotations), transcriptions (Exmaralda), discourse semantics (RST Discourse Treebank, Penn Discourse Treebank, Penn Discourse Graphbank
- a generic converter for XML-based corpus formats (XML2CoNLL)
- TBX2RDF forked to work within Fintan, but not stable, yet.