Skip to content

Latest commit

 

History

History
56 lines (44 loc) · 1.56 KB

stop-words.md

File metadata and controls

56 lines (44 loc) · 1.56 KB

Stop-words

The words which are generally filtered out before processing a natural language are called stop words. These are actually the most common words in any language (like articles, prepositions, pronouns, conjunctions, etc) and does not add much information to the text. Examples of a few stop words in English are “the”, “a”, “an”, “so”, “what”. (source)

Lyra automatically removes common stop-words for you, depending on the defaultLanguage parameter used during new instance creation.

As for now, Lyra supports 12 languages when it comes to stop-words removal:

  • English (default)
  • Italian
  • French
  • Spanish
  • Portugaise
  • Dutch
  • Swedish
  • Russian
  • Norwegian
  • German
  • Danish
  • Finnish

Disabling stop-words removal

You can disable stop-words removal by setting enableStopWords: false when creating a new Lyra instance:

import { create } from "@lyrasearch/lyra";

const db = create({
  schema: {
    author: "string",
    quote: "string",
  },
  tokenizer: {
    enableStopWords: false,
  },
});

Customizing stop-words

You can interact with the default Lyra stop-words by using the built-in customStopWords property when creating a new Lyra instance:

import { create } from "@lyrasearch/lyra";

const db = create({
  schema: {
    author: "string",
    quote: "string",
  },
  tokenizer: {
    customStopWords: (defaultStopWords) =>[...defaultStopWords, "foo", "bar"],
  },
});