-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pairwise string distance comparison #2517
Pairwise string distance comparison #2517
Conversation
… distance feature
…yShiUW/splink into pairwise_string_distance_comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank - I think this looks good. Minor comment below about the default argument and suggested refactor of test to align to the newer format - but other than that i think this is good to merge
@RobinL I believe I've addressed your comments. I don't understand why a test is failing -- it does not seem related in any way to these changes. |
@zmbc you are correct - apologies this is an unrelated issue #2515 (which will be fixed shortly, so should not be an issue going forward). Have re-run it to get it to pass, for clarity. |
Brilliant, thanks @zmbc and @JonnyShiUW this is great |
Type of PR
Is your Pull Request linked to an existing Issue or Pull Request?
This is a follow up to #2195, addressing the PR comments there. Closes #1994.
Give a brief description for the solution you have provided
As discussed in the prior PR, this mostly models
PairwiseStringDistanceFunctionAtThresholds
andPairwiseStringDistanceFunctionLevel
off ofDistanceFunctionAtThresholds
andDistanceFunctionLevel
respectively.
The main difference is that it is pairwise on an array column (duh) and that it only accepts a small list
of string distance functions and transpiles them, instead of the user passing an arbitrary SQL function.
PR Checklist