You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Processing Mateo Díaz,Johns Hopkins University,https://mateodd25.github.io/,F3cPGhsAAAAJ
Checking https://dblp.org/search/author/api?q=author%3AMateo%20Diaz:$%3A&format=json&c=10
WARNING: Possibly invalid name (Mateo Diaz). This may be a disambiguation entry.
Checking homepage URL (https://mateodd25.github.io/
Here is what I found:
matching_name_with_dblp("Mateo Díaz") just fine. It returns "1".
However, I realized that in the code we actually process Mateo Diaz (note í changed to i) which when I pass on to matching_name_with_dblp("Mateo Diaz") it returns "2" and causes the error due to name ambiguity.
name=unidecode.unidecode("Mateo Díaz")
print(f"Name after unicode normalization: {name}")
# prints: Name after unicode normalization: Mateo Diaz
I looked into whether one can encode í in ways that does not get removed by unidecode.unidecode(.) but nothing worked.
So in conclusion, my suggestion is to revise unidecode.unidecode(.) so that it does not escape í. For example:
importunicodedatadefcustom_unidecode(text, keep_characters="í"):
result= []
forcharintext:
# If the character is in the keep list, add it directlyifcharinkeep_characters:
result.append(char)
else:
# Normalize and strip accentsnormalized_char=unicodedata.normalize('NFD', char)
stripped_char=''.join(cforcinnormalized_charifunicodedata.category(c) !='Mn')
result.append(stripped_char)
return''.join(result)
# Example usagetext="Café con piñata and ítem"print(custom_unidecode(text))
text="Mateo Díaz"print(custom_unidecode(text))
# Would print: # Cafe con pinata and ítem# Mateo Díaz
Happy to send a PR if you like the suggested change.
The text was updated successfully, but these errors were encountered:
I was adding an entry for
Mateo Díaz
(https://dblp.org/pid/200/7297.html) which I realized that it causes an error.Here is what I found:
matching_name_with_dblp("Mateo Díaz")
just fine. It returns "1".Mateo Diaz
(noteí
changed toi
) which when I pass on tomatching_name_with_dblp("Mateo Diaz")
it returns "2" and causes the error due to name ambiguity.í
toi
) is done byunidecode.unidecode(.)
í
in ways that does not get removed byunidecode.unidecode(.)
but nothing worked.So in conclusion, my suggestion is to revise
unidecode.unidecode(.)
so that it does not escapeí
. For example:Happy to send a PR if you like the suggested change.
The text was updated successfully, but these errors were encountered: