Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more careful norm_text check #133

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add more careful norm_text check #133

wants to merge 1 commit into from

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Feb 21, 2024

This makes sure it's not none before checking if it can be stripped. Response to this error:

...
  File "/Users/cthoyt/dev/gilda/gilda/grounder.py", line 685, in load_entries_from_terms_file
    yield Term(*row_nones)
          ^^^^^^^^^^^^^^^^
  File "/Users/cthoyt/dev/gilda/gilda/term.py", line 55, in __init__
    if not norm_text.strip():
           ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'strip'

@cthoyt cthoyt requested a review from bgyori February 21, 2024 14:43
@bgyori
Copy link
Member

bgyori commented Feb 21, 2024

This probably doesn't hurt, but assuming that norm_text is produced by the normalize function, under what conditions could it be None?

@cthoyt
Copy link
Member Author

cthoyt commented Feb 21, 2024

It's happening in a place where I'm using load_entries_from_terms_file() - something in that process is leaving a None in the norm_text column. I'm trying to track that down now too

@cthoyt
Copy link
Member Author

cthoyt commented Feb 22, 2024

Tracked down the culprit by hacking some logging in to the load_entries_from_terms_file() function:

WARNING: [2024-02-22 01:40:56] gilda.grounder - [None, '-', 'umls', 'C5187432', '-', 'synonym', 'umls', None, None, None] Normalized text for Term cannot be None nor empty

https://uts.nlm.nih.gov/uts/umls/concept/C5187432

nice name -

@bgyori
Copy link
Member

bgyori commented Feb 22, 2024

The odd thing is that normalize('-') returns '', as expected, so the None is likely coming from some other code. Is it clear where that happens?

@cthoyt
Copy link
Member Author

cthoyt commented Feb 22, 2024

I am thinking this might be a weird syncing issue - I don't have UMLS update on the normal cycle since it's so big and hard to work with. I think what happened is that the UMLS index was dumped before we put the check inside the Term object, and so it's indexed with an empty string at the moment, which the load function turns into none since it does a truthy check. I'll try rebuilding UMLS and see what happens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants