-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After refactoring the subclass of sync, a lot of changes #8432
base: master
Are you sure you want to change the base?
Conversation
I eyeballed some of the, and they seem correct to me.. To review this: 1. Check out this file: https://raw.githubusercontent.com/monarch-initiative/mondo-ingest/main/src/ontology/reports/sync-subClassOf.confirmed.tsv (this is what is merged into mondo-edit) 2. Confirm that the changes reflect the changes you see to the edit file 3. Confirm that the changes to the sync-subClassOf.confirmed.tsv are intended The most important is (3), as I already did 1 and a few spotchecks re 2.
I'm not sure I am looking at this in the "right" direction. For example, for MONDO:0000022 it has these changes:
So in Mondo MONDO:0000022 'nocturnal enuresis' is a subclassof MONDO:0024290 'enuresis'. The Mondo term has Also, many of the subclassof sources that were removed are from DOID and NCIT. When I looked in the mondo-ingest repo at tmp/component-download-ncit.owl.owl I could find 'nocturnal enuresis' and 'enuresis', but when I looked at components/ncit.owl I could not find either class 'nocturnal enuresis' and 'enuresis'. |
At first I found this concerning, but then I checked our specs and they clearly state that we only ever sync the neoplasm branch with NCIT. enuresis is in the psychiatric disorder branch.. The DO is more interesting/concerning:
And in DO:
The confirmed subclass table is clearly missing this, only containing these two entries related to dengue:
In my opinion its a bug in the subclass sync, one that we didn't notice because my previous subclass pipeline didn't delete all the evidence.. So, I think we need to:
let me know what you think |
I added the ticket monarch-initiative/mondo-ingest#708, unclear what availability @joeflack4 has to fix this as an urgent priority. No, we're generally not ditching the |
Converting to Draft until this is fixed monarch-initiative/mondo-ingest#708 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General review
I spot checked some things. Looks like I'd expect; the deletions and lack thereof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checking the is_a
source
Just using the example case from monarch-initiative/mondo-ingest#708, even though that is an direct-source-indirect-mondo and not part of this PR.
subject_mondo_id | subject_mondo_label | object_mondo_id | subject_source_id | object_source_id | object_mondo_label |
---|---|---|---|---|---|
ID | SC % | >A oboInOwl:source | |||
MONDO:0000248 | dengue shock syndrome | MONDO:0005502 | DOID:0050125 | DOID:12205 | dengue disease |
id: MONDO:0000248
name: dengue shock syndrome
xref: DOID:0050125 {source="MONDO:equivalentTo"}
is_a: MONDO:0005502 {source="DOID:0050125", source="MONDO:Redundant"} ! dengue disease
I'm pretty sure source="DOID:0050125"
is correct but I just wanted to double check that this shouldn't be `source="DOID:12205", because in a way I feel like it makes sense to express the evidence that way as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-added
?
The PR title just says "subclass sync", but your comments are only about -confirmed
, so that's what I checked. Have we not implemented -added
for the subclass sync, or has it just not been run for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirm that the changes to the
sync-subClassOf.confirmed.tsv
are intended
@matentzn What are you asking precisely? I don't think you mean to the TSV; Like I don't think you're asking me to go to mondo-ingest
and look at the diff of that file over time.
It sounds like what you're asking is that the changes by the TSV are intended.
By intended, it sounds like you want to take this time to double check the mondo-ingest
subclass sync pipeline itself; to spot check a few cases where we can match up what we'd expect to see appear in the -confirmed
TSV, and then check to see if those entries do indeed appear in the TSV. Is that what you are asking?
I don't really have any doubts about this, but I did do a couple checks:
- Checked a random SCR that wasn't removed by this PR, verified the matches in
mondo.sssom.tsv
exist, and checked that the component.owl
does corroborate the SCR. - Checked a couple cases like this: A deletion, (
MONDO:0017626
is_a: MONDO:0018100 {source="Orphanet:306522"} ! familial primary hypomagnesemia
. Checked and saw that thatMONDO:0018100
has no Orphanet mapping inmondo.sssom.tsv
, so makes sense that this one was removed. - I wanted to check a case like: Mondo
is_a
evidence for a source was removed, and exist in themondo.sssom.tsv
, but it was only removed because the SCR didn't exist in the source, but it's best if I don't spend time digging around for such a case unless requested.
Following up from the Tech call and analysis of the missing subclassOf source annotations from NCIT (https://github.com/monarch-initiative/mondo-curation-analysis/tree/main/mondo-subclass-sync-analysis), the decision was to add the source provenance of I ran the subclass sync locally as Here are some data check results: Regex pattern to find subclassOf relationship with provenance: These number changes match the content of @matentzn can you let me know if you prefer any changes to this? |
Build PR for #8431
I eyeballed some of the, and they seem correct to me.. To review this:
The most important is (3), as I already did 1 and a few spotchecks re 2.
I suggest @joeflack4 you take a look here is well.