Quick Question #389

kd10041 · 2024-06-26T08:06:10Z

Where can I see the implementation of .partial_ratio() ? Can you let me know the logic which is utilized for this method.
Thanks in advance!

kd10041 · 2024-06-26T09:35:31Z

I am new to python!
So I went through the codebase of rapidfuzz. So basically the implementation of partial is divided to 2 parts

1. short needle implementation length<=64

Here a sliding window of length=min(len(s1),len(s2)) is used. and fuzz.ratio() is calculated on the all the alignments possible.
for example: I took this example from this issue

s1='real'
s2='barcelona'
fuzz.ratio(s1,s2)

So the best alignment here is window size is equals to 4 here.

r_eal
rce_l

This requires two operations output is 1-2/(4+4) = 75 which is the exact output as given by rapidfuzz.

2. long needle implementation length>64

This is similar to as implemented by fuzzywuzzy. The logic here is find the best alignment from shorter string to the Longest common substring of longer string. and find similarity score using fuzz.ratio() from the needle to longest common substring.

Can anyone give example clear this part up?
Any help would be appreciated regarding this.

maxbachmann · 2024-06-26T11:36:32Z

Oh the documentation is simply outdated. In the past I did use two implementations since I didn't have a way to make the implementation for long needles reasonably fast. However this did mean that the implementation for longer needles was similar to whats done in fuzzywuzzy, which doesn't always provide the correct results.

I have since found a better way to filter out impossible results and so I use the "correct" implementation both for short and long needles. You will still notice a drop in performance once the needle has more than 64 characters though.

From a user perspective it's simply a sliding window where the substring taken from the longer string has a length of.

<= the length of the needle if it starts/ends at the start/end of the longer string
the length of the needle if it's somewhere in the middle of the longer string

The pure Python fallback implementation is:

RapidFuzz/src/rapidfuzz/fuzz_py.py

Line 118 in 9359be2

def _partial_ratio_short_needle(s1, s2, score_cutoff):

The C++ implementation is https://github.com/rapidfuzz/rapidfuzz-cpp/blob/10426d24cd7479df0fe8c78b17877e756e1c3cd5/rapidfuzz/fuzz_impl.hpp#L68

The actual implementation doesn't actually check all alignments since it can use knowledge about the maximum distance change per shift of the sliding window to filter out some comparisons.

kd10041 · 2024-06-27T10:16:22Z

Thank you @maxbachmann for clear explanation.
Do you plan on updating the documentation anytime soon?

maxbachmann · 2024-06-27T17:57:16Z

Yes I will probably fix the docs at some point this week

maxbachmann added documentation Improvements or additions to documentation question Further information is requested labels Jun 27, 2024

kd10041 closed this as completed Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Question #389

Quick Question #389

kd10041 commented Jun 26, 2024

kd10041 commented Jun 26, 2024

maxbachmann commented Jun 26, 2024

kd10041 commented Jun 27, 2024

maxbachmann commented Jun 27, 2024

Quick Question #389

Quick Question #389

Comments

kd10041 commented Jun 26, 2024

kd10041 commented Jun 26, 2024

maxbachmann commented Jun 26, 2024

kd10041 commented Jun 27, 2024

maxbachmann commented Jun 27, 2024