-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerous False Negatives #12
Comments
In order for it to work the input text must make use of capitalization, because the underlying regex statement and the idea behind this library is to catch city names as capitalized named entities - otherwise it would only be a lookup. |
Ok, that makes sense. I can attempt to title case the data before I process it. However I also have to point out that no matter what I do certain cities like St. Louis are never recognized. Even when input as just "St. Louis" or "Saint Louis". |
This is right. You have to understand that there are two things at work here. A regular expression that tries to catch all named entities in a text, store it in a list and then look up those named entities in a table of city names. In cases like St. Louis I would guess that the regular expression does not catch the "St." in St. Louis, that is why it is not recognized. |
Hello, I've been using str.title() to capitalise strings. However, 'Malasya' is not identified as country even tho it comes up in origin: http://www.geonames.org/search.html?q=malasya |
Hello Elyase, very glad you have created and maintained this very useful python library. I'm currently using it to help parse quite a lot of info from the USPTO. Anyway I noticed quite a few errors where the library didn't capture the city and/or country from the string. Here are some examples of strings from the source data I ran the library against where the city and/or country was not picked out. Hopefully these cases can help you improve the library.
INDIANAPOLIS INDIANA.
BARDSLEY, ENGLAND
ST. LOUIS, MO.
WHITING, INDIANA, AND CHICAGO, ILLINOIS.
PHILADELPHIA PA.
LEROY, N.Y.
LYNDONVILLE, VT.
AMENIA, N. Y.
COPPERHILL, TENN.
DETROIT AND JOSEPH CAMPAU AT THE RIVER,MICH.
IVORYTON, CONN.
ST. LOUIS, MO. CORPORATION OF MISSOURI.
OGDENSBURG, N.Y.
NEAR SHEFFIELD, ENGLAND
INDIANAPOLIS IND.
BASLE,
ST. LOUIS, MO. REPUBLISHED BY MONSANTO COMPANY,/ST. LOUIS, MO.
LABORATORY PARK DECATUR, ILL.
1006 OAZA KADOMA, KADOMA-CHO KITAKAWACHI-GUN, OSAKA,
3501 W. 48TH PLACE CHICAGO 32, ILL.
700 BROADWAY NEW YORK, N.Y.
811 WYANDOTTE KANSAS CITY, MO.
835 S. 8TH ST. ST. LOUIS 2, MO.
47/51 EXMOUTH MARKET, ROSEBERRY AVE. LONDON E.C.1, ENGLAND
1407 CUMMINGS DRIVE RICHMOND 20, VA.
The text was updated successfully, but these errors were encountered: