Skip to content

Pairwise string distance comparison#2517

Merged
RobinL merged 10 commits intomoj-analytical-services:masterfrom
zmbc:pairwise_string_distance_comparison
Dec 3, 2024
Merged

Pairwise string distance comparison#2517
RobinL merged 10 commits intomoj-analytical-services:masterfrom
zmbc:pairwise_string_distance_comparison

Conversation

@zmbc
Copy link
Copy Markdown
Contributor

@zmbc zmbc commented Nov 20, 2024

Type of PR

  • BUG
  • FEAT
  • MAINT
  • DOC

Is your Pull Request linked to an existing Issue or Pull Request?

This is a follow up to #2195, addressing the PR comments there. Closes #1994.

Give a brief description for the solution you have provided

As discussed in the prior PR, this mostly models PairwiseStringDistanceFunctionAtThresholds and PairwiseStringDistanceFunctionLevel off of DistanceFunctionAtThresholds and DistanceFunctionLevel
respectively.
The main difference is that it is pairwise on an array column (duh) and that it only accepts a small list
of string distance functions and transpiles them, instead of the user passing an arbitrary SQL function.

PR Checklist

  • Added documentation for changes
  • Added feature to example notebooks or tutorial (if appropriate)
  • Added tests (if appropriate)
  • Updated CHANGELOG.md (if appropriate)
  • Made changes based off the latest version of Splink
  • Run the linter
  • Run the spellchecker (if appropriate)

Copy link
Copy Markdown
Member

@RobinL RobinL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank - I think this looks good. Minor comment below about the default argument and suggested refactor of test to align to the newer format - but other than that i think this is good to merge

Comment thread splink/internals/comparison_library.py Outdated
Comment thread tests/test_comparison_lib.py
Comment thread splink/internals/comparison_library.py Outdated
Comment thread splink/comparison_library.py
@zmbc
Copy link
Copy Markdown
Contributor Author

zmbc commented Dec 2, 2024

@RobinL I believe I've addressed your comments. I don't understand why a test is failing -- it does not seem related in any way to these changes.

@ADBond
Copy link
Copy Markdown
Contributor

ADBond commented Dec 3, 2024

@RobinL I believe I've addressed your comments. I don't understand why a test is failing -- it does not seem related in any way to these changes.

@zmbc you are correct - apologies this is an unrelated issue #2515 (which will be fixed shortly, so should not be an issue going forward). Have re-run it to get it to pass, for clarity.

@RobinL
Copy link
Copy Markdown
Member

RobinL commented Dec 3, 2024

Brilliant, thanks @zmbc and @JonnyShiUW this is great

@RobinL RobinL merged commit 02b702b into moj-analytical-services:master Dec 3, 2024
@RobinL RobinL mentioned this pull request Dec 5, 2024
11 tasks
@zmbc zmbc deleted the pairwise_string_distance_comparison branch December 7, 2024 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] Allow fuzzy matches on array-valued columns

3 participants