Skip to content

Applies Needleman-Wunsch algorithm to sequences of strings using Levenshtein distance as a scoring metric.

Notifications You must be signed in to change notification settings

dell-research-harvard/NeedlemanWunschNames

Repository files navigation

Needleman Wunsch Names

Example Jupyter Notebook

This repo provides code to apply the Needleman-Wunsch algorithm to match ordered sequences of strings (Japanese company names were the original use case, hence the repo's name) using Levenshtein distance as a scoring metric.

Example

To run the algorithm from bash use:

python3 align-company-names.py data/index-example.csv \
	data/book-example.csv \
	data/output-example.csv \
	--index_col text --book_col "company_name" --nrows 10

A static example notebook is available here. Alternatively, use the binder link above.

Much of the code found here was generously made available by The Wilke Lab and modified during the author's time working for Melissa Dell.

About

Applies Needleman-Wunsch algorithm to sequences of strings using Levenshtein distance as a scoring metric.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published