-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quick Question #389
Comments
I am new to python! 1. short needle implementation length<=64 Here a sliding window of
So the best alignment here is window size is equals to 4 here.
This requires two operations output is 1-2/(4+4) = 75 which is the exact output as given by rapidfuzz. 2. long needle implementation length>64 This is similar to as implemented by fuzzywuzzy. The logic here is find the best alignment from shorter string to the Longest common substring of longer string. and find similarity score using Can anyone give example clear this part up? |
Oh the documentation is simply outdated. In the past I did use two implementations since I didn't have a way to make the implementation for long needles reasonably fast. However this did mean that the implementation for longer needles was similar to whats done in fuzzywuzzy, which doesn't always provide the correct results. I have since found a better way to filter out impossible results and so I use the "correct" implementation both for short and long needles. You will still notice a drop in performance once the needle has more than 64 characters though. From a user perspective it's simply a sliding window where the substring taken from the longer string has a length of.
The pure Python fallback implementation is: RapidFuzz/src/rapidfuzz/fuzz_py.py Line 118 in 9359be2
The C++ implementation is https://github.com/rapidfuzz/rapidfuzz-cpp/blob/10426d24cd7479df0fe8c78b17877e756e1c3cd5/rapidfuzz/fuzz_impl.hpp#L68 The actual implementation doesn't actually check all alignments since it can use knowledge about the maximum distance change per shift of the sliding window to filter out some comparisons. |
Thank you @maxbachmann for clear explanation. |
Yes I will probably fix the docs at some point this week |
Where can I see the implementation of .partial_ratio() ? Can you let me know the logic which is utilized for this method.
Thanks in advance!
The text was updated successfully, but these errors were encountered: