This repo provides code to apply the Needleman-Wunsch algorithm to match ordered sequences of strings (Japanese company names were the original use case, hence the repo's name) using Levenshtein distance as a scoring metric.
To run the algorithm from bash use:
python3 align-company-names.py data/index-example.csv \
data/book-example.csv \
data/output-example.csv \
--index_col text --book_col "company_name" --nrows 10
A static example notebook is available here. Alternatively, use the binder link above.
Much of the code found here was generously made available by The Wilke Lab and modified during the author's time working for Melissa Dell.