Skip to content
This repository has been archived by the owner on Aug 26, 2024. It is now read-only.

how ratio in fuzzy-wuzzy calculated? #289

Open
fatimamb opened this issue Nov 9, 2020 · 1 comment
Open

how ratio in fuzzy-wuzzy calculated? #289

fatimamb opened this issue Nov 9, 2020 · 1 comment

Comments

@fatimamb
Copy link

fatimamb commented Nov 9, 2020

I am trying to understand the score in fuzzy-wuzzy calculated.
so for now I know it depends on SequenceMatcher from difflib package.
and as shown in difflib document the score calculated as this link:

Return a measure of the sequences’ similarity as a float in the range [0, 1].

Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T.
 Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common.

but my first question what 2.0 referred to?

also, in get_opcodes, there is equal, replace and delete.

s = SequenceMatcher("private","privateT")
    for opcode in s.get_opcodes():
          print "%6s a[%d:%d] b[%d:%d]" % opcode

my second question does any of them affect the ratio score?

I had read some posts as here
taking about the cost in edit distance,
is that consider in fuzzy-wuzzy or difflib score?

thank you

@fatimamb fatimamb changed the title how ratio in uzzy-wuzzy calculated? how ratio in fuzzy-wuzzy calculated? Nov 9, 2020
@MahmoudAliEng
Copy link

As far as I know that FW uses the Levenshtein similarity ratio. You can find more explanation about its logic in this amazing article.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants