Add transliteration with beam-search #48

jerinphilip · 2024-01-22T14:57:29Z

Work in progress

Eventual goal is to implement beam-search for transliteration to generate multiple candidates (unlike the greedy decoding after forced overfitting in the case of translation).

seq2seq is an overkill for transliteration. The overkill mostly happens with an expert user in a deterministic (and not statistical) IME. The hopes of this effort is that NNs as powerful enough function approximators can be used to make life easier for a no-expert user. The following benefits come to mind:

No need to switch between case alterations and symbols (~), simply type all small letters to get associated most likely outputs.
Use beam-search to generate multiple targets that are most likely from a given source variation.
Robustness to typing errors and noise. Some masked character training should allow the network to guess the most suitable character/subword from context.
Long-context selection. GBoard (WFSTs?) fails with really long agglutinated sequences, seq2seq with transformers appear to be doing better on a cursory try-out (this claim will have to be validated).

The model trained for a first-exploration already provides good enough variations among candidates.

naal
0 ||| നാൽ ||| F0= -0.247635 ||| -0.247635
0 ||| നാൾ ||| F0= -2.30577 ||| -2.30577
0 ||| നാല് ||| F0= -2.57854 ||| -2.57854
0 ||| നാള് ||| F0= -4.42439 ||| -4.42439
0 ||| നാല ||| F0= -4.5098 ||| -4.5098

…8n/t12n

jerinphilip added 2 commits January 22, 2024 20:26

Initialize search abstractions, make way for beam-search enabled t12n

dc317b5

Still working on placing greedy/beam-search, dedupe helpers between t…

d2abd88

…8n/t12n

jerinphilip changed the title ~~Add transliteration with beam-search (multiple candidates)~~ Add transliteration with beam-search Jan 22, 2024

jerinphilip added 16 commits January 23, 2024 11:41

Relay functions at Transformer

b670715

Stash changes

88b23e3

beam-search branch compiles

902ad35

Complete greedy decode surgery

2d82575

Fix indices -> step.shortlist()

035a81c

Separate out GenerationStep, Result

86fce30

Begin beam-search

e62e267

Make way for beam-search

4cd82a6

Merge branch 'main' into beam-search

eeb672e

Remove accidental inclusion

2c3c7b1

Apply clang-tidy fixes

0af722e

Remove #if 0 guard

2dbbd47

Replace search with every model

bdc7276

More local clang-tidy fixes

ea0ed38

Promote to double, lesser errors

78279a3

Sent line broken data to t12n

29cc579

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add transliteration with beam-search #48

Add transliteration with beam-search #48

Uh oh!

jerinphilip commented Jan 22, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add transliteration with beam-search #48

Are you sure you want to change the base?

Add transliteration with beam-search #48

Uh oh!

Conversation

jerinphilip commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jerinphilip commented Jan 22, 2024 •

edited

Loading