Skip to content
This repository has been archived by the owner on Sep 4, 2019. It is now read-only.

Latest commit

 

History

History
30 lines (17 loc) · 776 Bytes

README.md

File metadata and controls

30 lines (17 loc) · 776 Bytes

Lithuanian language processing tools to be used in NLP, search or other applications.

Sentence detection

Folder: sentence-detect

OpenNLP model for Lithuanian sentence detection.

Scripts to help with building the model:

  • add - append new text into the model (see comment inside the script)
  • train - build model based on example corpora
  • evaluate - evaluate detection quality

Snowball

Snowball version of Porter stemmer for Lithuanian language was moved to this page.

Language detection

Folder: language-detect

N-grams for Lithuanian language detection. Used in Apache Tika https://issues.apache.org/jira/browse/TIKA-582

License

Copyright (C) 2011 UAB TokenMill

Distributed under the Eclipse Public License.