Needleman Wunsch Names

This repo provides code to apply the Needleman-Wunsch algorithm to match ordered sequences of strings (Japanese company names were the original use case, hence the repo's name) using Levenshtein distance as a scoring metric.

Example

To run the algorithm from bash use:

python3 align-company-names.py data/index-example.csv \
	data/book-example.csv \
	data/output-example.csv \
	--index_col text --book_col "company_name" --nrows 10

A static example notebook is available here. Alternatively, use the binder link above.

Much of the code found here was generously made available by The Wilke Lab and modified during the author's time working for Melissa Dell.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
NW_Company_Names		NW_Company_Names
data		data
.gitignore		.gitignore
README.md		README.md
align-company-names.py		align-company-names.py
example.ipynb		example.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Needleman Wunsch Names

Example

About

Releases

Packages

Languages

dell-research-harvard/NeedlemanWunschNames

Folders and files

Latest commit

History

Repository files navigation

Needleman Wunsch Names

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages