Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Replace text normalization with local model #163

Open
AmitMY opened this issue Jun 30, 2024 · 0 comments
Open

[Feature] Replace text normalization with local model #163

AmitMY opened this issue Jun 30, 2024 · 0 comments
Labels
enhancement New feature or request spoken-to-signed

Comments

@AmitMY
Copy link
Contributor

AmitMY commented Jun 30, 2024

Problem

At the beginning of the spoken-to-signed translation pipeline, we perform multiple tasks, an important one of which is text normalization.

image

Unlike the others, which run completely offline, text normalization relies on an online-only solution which can degrade performance when offline, or create small delays when running online.
Ideally, for privacy concerns, we would also like to move this endpoint to a local model.
Furthermore, it costs us money to run this API endpoint, calling GPT-3 to automatically normalize the text.

Description

Seems like every large company is pushing for local small LLMs, with limited world knowledge but superb text processing abilities.
For example, Google is pushing Gemini Nano in chrome (experimental API): https://x.com/rauchg/status/1806385778064564622
https://developer.chrome.com/docs/ai/built-in

If this API ever reaches production, we should prompt it instead of prompting ChatGPT.

Alternatives

Train our own normalization model on existing text normalization data, or collect data using ChatGPT.
Training our own model would take away resources from our main objective, and will require the user to host another model on their device (which is undesirable).

@AmitMY AmitMY added enhancement New feature or request spoken-to-signed labels Jun 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request spoken-to-signed
Projects
None yet
Development

No branches or pull requests

1 participant