Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline currently mis-maps IEAs around GOlr, likely related to owltools or its environment #251

Closed
kltm opened this issue Oct 12, 2021 · 13 comments

Comments

@kltm
Copy link
Member

kltm commented Oct 12, 2021

There are 128276 more annotations on release than were on the candidate. Looking at AmiGO, it seems like "evidence used in automatic assertion" (ECO:0000501) has entirely disappeared. Strangely, a very small numberof IEA annotations did manage to find their way in:

https://amigo-staging.geneontology.io/amigo/search/annotation?q=IEA*&fq=-evidence_subset_closure_label:%22genetic%20interaction%20evidence%20used%20in%20manual%20assertion%22&fq=-evidence_subset_closure_label:%22biological%20aspect%20of%20ancestor%20evidence%20used%20in%20manual%20assertion%22&sfq=document_category:%22annotation%22

That's weird.
I'm putting a pin in that for now.

That leaves us with two paths for data to be dropped, assuming that we are at issue and not upstream: ontobio and owltools.

Checking around for a data set let's arbitrarily select the MGI GAF as an example, specifically the IEAs:

snapshot | release
src: 73326 | 73829
valid: 73318 | 73821

So, that is a small change in the src/valid numbers. In AmiGO, filtering for just species "mouse" annotations, there are actually /more/ in the snapshot...so hm.

My guess for the moment is that there is something going on in either owltools or the docker mechanism around it (possibly cached old files) that is causing a problem here. I'll do a bit more digging to try and get at what's going on.

@kltm
Copy link
Member Author

kltm commented Oct 12, 2021

GOlr build step runs with>

java \
    -Xms$LOADER_MEM \
    -Xmx$LOADER_MEM \
    -DentityExpansionLimit=8172000 \
    -Djava.awt.headless=true \
    -jar /srv/amigo/java/lib/owltools-runner-all.jar  \
    $ONTOLOGIES \
    --log-info \
    --solr-config /srv/amigo/metadata/ont-config.yaml \
    --merge-support-ontologies \
    --merge-imports-closure \
    --remove-subset-entities upperlevel \
    --remove-disjoints \
    --silence-elk \
    --reasoner elk \
    --solr-taxon-subset-name amigo_grouping_subset \
    --solr-eco-subset-name go_groupings \
    --solr-url http://localhost:8080/solr/ \
    --solr-log /tmp/golr_timestamp.log \
    --solr-load-ontology \
    --solr-load-ontology-general  \
    --read-panther /tmp/panther_data/ \
    --solr-load-gafs $GAFS \
    --solr-load-panther \
    --solr-load-panther-general \
    --solr-optimize

The only explicit ECO reference is --solr-eco-subset-name go_groupings.

@kltm
Copy link
Member Author

kltm commented Oct 12, 2021

@kltm
Copy link
Member Author

kltm commented Oct 12, 2021

The go_groupings subset seems to have not changed with the above diff.

That said, looking at the older eco-basic.obo
https://github.com/evidenceontology/evidenceontology/blob/b6fa5152b9063bcf1ae01c573e35302578156f9f/eco-basic.obo
and newer
https://github.com/evidenceontology/evidenceontology/blob/master/eco-basic.obo
versions of ECO, it seems that what changed here it that "IEA" as a synonym has been changed. Oddly (to me at least), other structures are untouched.

Starting to dive into owltools a little...
https://github.com/owlcollab/owltools/blob/9faa4f42b761839a26e8c8096cd24044e2bdcfc7/OWLTools-Annotation/src/main/java/owltools/gaf/eco/EcoMapper.java#L24
makes use of the (I assume proper) http://purl.obolibrary.org/obo/eco/gaf-eco-mapping.txt
The code doing the mapping work appears to be:
https://github.com/owlcollab/owltools/blob/master/OWLTools-Annotation/src/main/java/owltools/gaf/EcoTools.java

TODO: May need to update http://wiki.geneontology.org/index.php/Inferred_from_Electronic_Annotation_(IEA)

@kltm
Copy link
Member Author

kltm commented Oct 12, 2021

Getting a little more cute with AmiGO, we can see that the annotations are more or less there:
http://amigo.geneontology.org/amigo/search/annotation?q=*:*&fq=evidence_type:%22IEA%22&sfq=document_category:%22annotation%22
https://amigo-staging.geneontology.io/amigo/search/annotation?q=*:*&fq=evidence_type:%22IEA%22&sfq=document_category:%22annotation%22
But that the closures, at least, are not getting created properly for them. Looking directly at GOlr:

http://golr.geneontology.org/solr/select?defType=edismax&qt=standard&indent=on&wt=json&rows=10&start=0&fl=bioentity,bioentity_name,qualifier,annotation_class,annotation_extension_json,assigned_by,taxon,evidence_closure,evidenve_closure_label,evidence_subset_closure,evidence_subset_closure_label,evidence_type,evidence_with,panther_family,type,bioentity_isoform,reference,date,bioentity_label,annotation_class_label,taxon_label,panther_family_label,score&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&hl=true&hl.simple.pre=%3Cem%20class=%22hilite%22%3E&hl.snippets=1000&fq=evidence_type:%22IEA%22&q=*:*
http://golr-staging.geneontology.io/solr/select?defType=edismax&qt=standard&indent=on&wt=json&rows=10&start=0&fl=bioentity,bioentity_name,qualifier,annotation_class,annotation_extension_json,assigned_by,taxon,evidence_closure,evidenve_closure_label,evidence_subset_closure,evidence_subset_closure_label,evidence_type,evidence_with,panther_family,type,bioentity_isoform,reference,date,bioentity_label,annotation_class_label,taxon_label,panther_family_label,score&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&hl=true&hl.simple.pre=%3Cem%20class=%22hilite%22%3E&hl.snippets=1000&fq=evidence_type:%22IEA%22&q=*:*
they seem to have been created empty in the loader and are not appearing in the document.

kltm added a commit to owlcollab/owltools that referenced this issue Oct 12, 2021
…co tests; add some testing for IEAs; attempting clarifying work on work on geneontology/pipeline#251
@kltm
Copy link
Member Author

kltm commented Oct 12, 2021

Attempt at quick recreation of possible issue, after doing some small updates of owltools ECO tests (see owlcollab/owltools#321); unfortunately, testing cleanly with fixes:
mvn clean -Dtest="EcoMapperTest" -DfailIfNoTests=false test
mvn clean -Dtest="TraversingEcoMapperTest" -DfailIfNoTests=false test

I'm having a little trouble figuring out where to go next. Maybe need to dig in more around the GAF document loader?

@kltm
Copy link
Member Author

kltm commented Oct 13, 2021

Noting that the far upstream origin of this issue is likely evidenceontology/evidenceontology#251

kltm added a commit to owlcollab/owltools that referenced this issue Oct 13, 2021
@kltm
Copy link
Member Author

kltm commented Oct 13, 2021

@kltm
Copy link
Member Author

kltm commented Oct 13, 2021

@cmungall @balhoff, with the context right above here (#251 (comment)), I was wondering if you could easily spot what might be going wrong? I can start working through getClassesForGoCode, I could also switch to using getClassesForCode, which seems to be giving results that are more inline with what is expected (unsure why these choices were made). Any thoughts here?

@kltm
Copy link
Member Author

kltm commented Oct 15, 2021

@balhoff @cmungall To resummarize where I'm at:

With that, my questions are, in order of utility towards fixing this:

  • Why is getClassesForGoCode apparently broken with this ontology change?
  • What is the difference between getClassesForGoCode and getClassesForCode
  • Is getClassesForCode a safe replacement for getClassesForGoCode for our ECO use case?
    • If so, why does getClassesForGoCode exist at all?

@kltm
Copy link
Member Author

kltm commented Oct 20, 2021

Talking with @pgaudet @mgiglio99 et al. this morning, the current plan is to: have eco / mapping for IEA to revert to its previous state for the time being, get a release out, and continue to move forward on finding a fix with an eye for the next release out.

@kltm
Copy link
Member Author

kltm commented Oct 25, 2021

I believe we've cleared other issues blocking the pipeline and am restarting the release pipeline.

@kltm
Copy link
Member Author

kltm commented Dec 10, 2021

I'm assuming this is closed for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant