Oleander Stemming Library

About

C++ library for stemming words down to their roots.

Stemming is useful for Natural Language Processing and Information Retrieval systems. The first step in an NLP system is to strip words down to their roots. Afterwards, these roots can be combined, tabulated, categorized, etc.

For example, a stemmer can trim words such as connection, connections, connective, connected, and connecting down to the word connect. From there, the frequency counts of these words can be tabulated to determine how many times words related to connect exist in the corpus.

Features

Based on the Porter/Snowball stemming family of algorithms
Case insensitive
Header-only library
Includes Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, and Swedish

Example

#include "danish_stem.h"
#include "dutch_stem.h"
#include "english_stem.h"
#include "finnish_stem.h"
#include "french_stem.h"
#include "german_stem.h"
#include "italian_stem.h"
#include "norwegian_stem.h"
#include "portuguese_stem.h"
#include "russian_stem.h"
#include "spanish_stem.h"
#include "swedish_stem.h"

#include <iostream>
#include <string>

int main()
    {
    // the word to be stemmed
    std::wstring word(L"documentation");

    /* Create an instance of a "english_stem" class. The template argument for the
       stemmers are the type of std::basic_string that you are trying to stem,
       by default std::wstring (Unicode strings).
       As long as the char type of your basic_string is wchar_t, then you can use
       any type of basic_string.
       This is to say, if your basic_string has a custom char_traits or allocator,
       then just specify it in your template argument to the stemmer. For example:

       using myString = std::basic_string<wchar_t, myTraits, myAllocator>;
       myString word(L"documentation");

       stemming::english_stem<myString> StemEnglish;
       StemEnglish(word);
    */

    stemming::english_stem<> StemEnglish;
    std::wcout << L"(English) Original text:\t" << word.c_str() << std::endl;

    // The "english_stem" has its operator() overloaded, so you can
    // treat your class instance like it's a function.  In this case,
    // pass in the std::wstring to be stemmed. Note that this alters
    // the original std::wstring, so when the call is done the string will
    // be stemmed.
    StemEnglish(word);
    // now the variable "word" should equal "document"
    std::wcout << L"(English) Stemmed text:\t" << word.c_str() << std::endl;

    // try a similar word that should have the same stem
    word = L"documenting";
    std::wcout << L"(English) Original text:\t" << word.c_str() << std::endl;
    StemEnglish(word);
    // now the variable "word" should equal "document"
    std::wcout << L"(English) Stemmed text:\t" << word.c_str() << std::endl;

    // now try a French word
    stemming::french_stem<> StemFrench;
    word = L"continuellement";
    std::wcout << L"\n(French) Original text:\t" << word.c_str() << std::endl;
    StemFrench(word);
    // now the variable "word" should equal "continuel"
    std::wcout << L"(French) Stemmed text:\t" << word.c_str() << std::endl;

    // many other stemmers are also available
    stemming::danish_stem<> StemDanish;
    stemming::dutch_stem<> StemDutch;
    stemming::finnish_stem<> StemFinnish;
    stemming::italian_stem<> StemItalian;
    stemming::german_stem<> StemGerman;
    stemming::norwegian_stem<> StemNorwegian;
    stemming::portuguese_stem<> StemPortuguese;
    stemming::russian_stem<> StemRussian;
    stemming::spanish_stem<> StemSpanish;
    stemming::swedish_stem<> StemSwedish;

    return 0;
    }

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github/workflows		.github/workflows
docs		docs
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
Changelog.md		Changelog.md
readme.md		readme.md
stemming.png		stemming.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Oleander Stemming Library

About

Features

Example

About

Contributors 2

Languages

Blake-Madden/OleanderStemmingLibrary

Folders and files

Latest commit

History

Repository files navigation

Oleander Stemming Library

About

Features

Example

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages