Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition. Learn more by taking a quick tour or by reading the manual.
Features
- Text extractors for PDF, HTML, XML, Word, AbiWord, OpenOffice and image formats (Ocropus).
- Text chunkers, sentence segmenters, tokenizers, and parsers (Stanford & Enju).
- Lexical resources (WordNet interface, several POS taggers for English).
- Language, date/time, topic words (LDA) and keyword (TF*IDF) extraction.
- Word inflectors, including stemmers, conjugators, declensors, and number inflection.
- Serialization of annotated entities to YAML, XML or to MongoDB.
- Visualization in ASCII tree, directed graph (DOT) and tag-bracketed (standoff) formats.
- Linguistic resources, including language detection and tag alignments for several treebanks.
- Machine learning (decision tree, multilayer perceptron, LIBLINEAR, LIBSVM).
- Text retrieval with indexation and full-text search (Ferret).
Contributing
I am actively seeking developers that can help maintain and expand this project. You can find a list of ideas for contributing to the project here.
Authors
Lead developper: @louismullie [Twitter]
Contributors:
- @bdigital
- @automatedtendencies
- @LeFnord
- @darkphantum
- @whistlerbrk
License
This software is released under the GPL License and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.