[Feature] Replace text normalization with local model #163

AmitMY · 2024-06-30T09:15:50Z

Problem

At the beginning of the spoken-to-signed translation pipeline, we perform multiple tasks, an important one of which is text normalization.

Unlike the others, which run completely offline, text normalization relies on an online-only solution which can degrade performance when offline, or create small delays when running online.
Ideally, for privacy concerns, we would also like to move this endpoint to a local model.
Furthermore, it costs us money to run this API endpoint, calling GPT-3 to automatically normalize the text.

Description

Seems like every large company is pushing for local small LLMs, with limited world knowledge but superb text processing abilities.
For example, Google is pushing Gemini Nano in chrome (experimental API): https://x.com/rauchg/status/1806385778064564622
https://developer.chrome.com/docs/ai/built-in

If this API ever reaches production, we should prompt it instead of prompting ChatGPT.

Alternatives

Train our own normalization model on existing text normalization data, or collect data using ChatGPT.
Training our own model would take away resources from our main objective, and will require the user to host another model on their device (which is undesirable).

AmitMY added enhancement New feature or request spoken-to-signed labels Jun 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Replace text normalization with local model #163

[Feature] Replace text normalization with local model #163

AmitMY commented Jun 30, 2024 •

edited

Loading

[Feature] Replace text normalization with local model #163

[Feature] Replace text normalization with local model #163

Comments

AmitMY commented Jun 30, 2024 • edited Loading

Problem

Description

Alternatives

AmitMY commented Jun 30, 2024 •

edited

Loading