a simple spelling corrector in python.
Based on the work of Peter Norvig.
Steps:
- run
generate_word_count.py
to create theWORDS
pickle file. - run
unit_tests.py
to make sure all the tests pass. - run
spell_test.py
on the given test files and get the accuracy of the corrector.
Note that the datasets
directory contains the required data in .txt
formats:
big.txt
is a concatenation of public domain book excerpts from Project Gutenberg and lists of most frequent words from Wiktionary and the British National Corpus.spell-testset1.txt
andspell-testset2.txt
are test files containing both correctly spelled words and their incorrect variations extracted from Birkbeck spelling error corpus
More info on the algorithm.