Skip to content

Commit

Permalink
Doc review completed
Browse files Browse the repository at this point in the history
Signed-off-by: Melissa Vagi <[email protected]>
  • Loading branch information
vagimeli committed Oct 8, 2024
1 parent 55bcf61 commit 176996e
Showing 1 changed file with 5 additions and 9 deletions.
14 changes: 5 additions & 9 deletions _analyzers/character-filters/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,12 @@ has_toc: false

# Character filters

Character filters process the text before tokenization, modifying or cleaning the input to prepare it for further analysis.
Character filters process the text before tokenization, modifying, or cleaning the input to prepare it for further analysis.

Unlike token filters, which operate on tokens (words or terms), character filters work on the raw input text before tokenization. They are especially useful for cleaning or transforming structured text with unwanted characters, like HTML tags or special symbols. Character filters help strip or replace these elements, ensuring the text is properly formatted for analysis.
Unlike token filters, which operate on tokens (words or terms), character filters work on the raw input text before tokenization. They are especially useful for cleaning or transforming structured text with unwanted characters, such as HTML tags or special symbols. Character filters help strip or replace these elements so that text is properly formatted for analysis.

Use cases for character filters include:
## HTML stripping
Removing HTML tags from content, ensuring only the visible text is indexed. See [HTML stripping]({{site.url}}{{site.baseurl}}/analyzers/html-character-filter) for more information.

## Pattern replacement
Replacing or removing unwanted characters or patterns in text (e.g., converting hyphens to spaces
## Custom mappings
Substituting specific characters or sequences with other values, such as converting currency symbols into their textual equivalents.

- **HTML stripping:** Removes HTML tags from content so that only the plain text is indexed. See [HTML stripping]({{site.url}}{{site.baseurl}}/analyzers/html-character-filter) for more information.

Check failure on line 17 in _analyzers/character-filters/index.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LinksEndSlash] Add a trailing slash to the link '({{site.url}}{{site.baseurl}}/analyzers/html-character-filter)'. Raw Output: {"message": "[OpenSearch.LinksEndSlash] Add a trailing slash to the link '({{site.url}}{{site.baseurl}}/analyzers/html-character-filter)'.", "location": {"path": "_analyzers/character-filters/index.md", "range": {"start": {"line": 17, "column": 114}}}, "severity": "ERROR"}
- **Pattern replacement:** Replaces or removes unwanted characters or patterns in text, for example, converting hyphens to spaces.
- **Custom mappings:** Substitutes specific characters or sequences with other values, for example, converting currency symbols into their textual equivalents.

0 comments on commit 176996e

Please sign in to comment.