So far I am running a dict over the dumped data correcting common error mistakes, but this is far from ideal.
We should use a bigger, maybe online dataset that is based on some kind of metric (maybe Levenshtein distance?) to determine the correct name could work, but I feel like someone already did something like this before. We should research it further.
The HOF would also benefit from this.
So far I am running a dict over the dumped data correcting common error mistakes, but this is far from ideal.
We should use a bigger, maybe online dataset that is based on some kind of metric (maybe Levenshtein distance?) to determine the correct name could work, but I feel like someone already did something like this before. We should research it further.
The HOF would also benefit from this.