How would one use recently developed transformer language models (token probabilities) small enough to run on mobile computers to obtain intelligent, context informed typo correction? The ml-assisted typing interface on my phone (I suspect they use some kind of n-gram model) displays suggestions for words it thinks you are going to type, and the suggestions get better as you type the word, but doesn't autocorrect unless the word is misspelled. Even when the user inputs a word that is clearly out of context, the "autocorrector" doesn't intervene.
My proposal is to use foundational transformer language models to identify when the probability of a word (GPT-2 for last word predictions, BERT for masked, bidirectional, follow up predictions) is low and there is a similar word that maximizes a combined metric of their likelihood and similarity defined by multiplying the normalized token prediction probabilities
Using a small, custom written dataset (currently n=356), and with slightly modified metrics to account for "false true positives" (where the model made a change where one was needed, but produced the wrong output), current testing shows, within variation because metrics cannot be that accurate at this scale (among batches
text[1-5].ipynb : development/tuning
text6.ipynb : model case visualization utility
text7.ipynb : correction engine, test metrics, automated parameter learning
strings.txt : test dataset, n=356
text8.py : example real-time correction utility
text9.py : correction utility with feedback popup on correction, please send me your feedback.json!
SequenceMatching is augmented from stock because of sensitivity to letter changes being drastically increased if the length of the words are small, because of fewer sequences to be matched. To remedy this we adjust the SequenceMatching similarity by an augmentation table (see Augmentation table refinement), that maps out the approximate bias along the affected groups (
Optimization history
(figure 1)
Hyperparameter importances
(figure 2)
Hyperparameter value search (Parallel Coordinate Plot)
(figure 3)
Augmentation table search (Parallel Coordinate Plot)
(figure 4)
Automated fine tuning produced an optimal augmentation table (as seen in figure 5):
- {1: 0.5, 2: 0.29, 3: 0.14, 4: 0.05}
Original proposed: {1: 0.5, 2: 0.3, 3: 0.2, 4: 0.1}
Linear and logarithmic mappings with sim_bound=0.5
(figure 5)
Examples from text6.ipynb; word token probabilities shaded by similarity to target
(figure 6)
- Allow for recalculation and correcting other words based on corrected context
- Fine tuning GPT2 and BERT once my task dataset is large enough
- Use lemmas in reverse (consider all expanded of prediction)
- Custom similarity or more SequenceMatching optimization
- Many more mechanisms and optimizations to make this correction system ready for real-world use