-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: can't locate complexes specified in our GPI #910
Comments
Noting that this should be I am able to find this term in at least some locations in the Noctua interface. |
So I should always put ComplexPortal:CPX-566? cc @PCarme |
@ValWood We can dig into this a little when @vanaukenk is back, but it may be that the difference is what is supplied in the synonyms, etc. |
Are there any docs for how to specify complexes in GPI ? @kimrutherford can check that we are doing it correctly. No hurry until @vanaukenk is back. |
Hi @vanaukenk can you let us know how complexes should be specified in the GPAD so we can check that we are doing it correctly? |
@vanaukenk I have just got our devs to add complexes to our gpi (not in production yet) based on SGD's gpi and would like to check that the file is spec'd correctly as well. |
There are some issues surrounding use of ComplexPortal ids in GO-CAMs that need to be definitively resolved. Are you both available next Thursday? |
@vanaukenk that is good for me. Thanks |
Is this helpful? https://geneontology.org/docs/gene-product-information-gpi-format/ We can add a protein complex example. |
Thanks Pascale. We've been using the GPI 2.0 spec: Perhaps that's a problem? |
We're putting the complex members in column 9 ("Protein_Containing_Complex_Members"), following the spec. The spec says This is an example of column 9 from our GPI file:
|
removing excess IDs, note also that this "example" doesn't match what SGD will be providing after next ingestion of SGD gpi- see geneontology/noctua#910
Noting here there are still some issues with SGD's complexes: https://release.geneontology.org/2024-09-08/annotations/sgd.gpi.gz has
https://release.geneontology.org/2024-09-08/products/upstream_and_raw_data/sgd-src.gpi.gz has:
To find this complex in Noctua, the only current way is to enter S000218003 or SGD:S000218003 in the Term box, where the entity pops up with as SGD is modifying the supplied GPI and the next available GPI from SGD will look more like the /annotations/sgd.gpi :
Strongly related ticket #914 |
Update, I have been able to locate complexes only If I omit the hyphen from the identifier. |
...but the has_parts are not automatically imported |
I'm looking into this some more today. So far, what I find when searching in the gene product field is: CPX-566 does return the right complex, but it is very far down on the autocomplete selection list, i.e. the 40th entity listed The search behavior is the same in the VPE as well as the standard annotation editor. I'll ask @tmushayahama about the search criteria to see if there's anything we can do to bump the right enty to the top of the search list when using CPX-566, as I am assuming that's the entry you'd most likely make? @ValWood @suzialeksander I'm still looking into the SGD issues, as I can't find the SGD complexes in noctua-amigo, suggesting that this is a different problem. |
I've been looking into the SGD gpi and protein complexes and honestly don't understand what's happening here. I see the exact same behavior you see. I'll need some help troubleshooting from @kltm and @tmushayahama |
@ValWood - we haven't done any work yet to implement this functionality, but are aware it would be very helpful. |
Yes. Its strange that IDs with spaces take priority over the correct identifier. As far as I'm aware, identifiers never have spaces? |
I wanted to clarify a little about what is going on here wrt I'd have to look into the exact math to be sure, but essentially, when looking for http://noctua-amigo.berkeleybop.org/amigo/term/ComplexPortal:CPX-566 , there are a few ways to get at it. If we look at the general index search on the noctua autocomplete AmiGO instance (http://noctua-amigo.berkeleybop.org, upper-right):
If we look at the "Filter by Term" "ontology" search on the Noctua landing page:
First, to reiterate, this should not be an issue and we would like to prioritize fully fixing our search at so that we don't need to have these conversations. That aside, for context for what we're seeing here today: The two indexes here treat a couple of things a little differently, which is why we get the different results. What is likely happening in the second case (that is being used by the Noctua interface) is that when the Technically speaking, there are things one can do in a case like this to ensure better results (e.g. when there is a dash also search for the quoted string or something), but we will need to weigh the effort needed to make and tune that versus the effort to just "start over" on the autocomplete with a newer and more robust system. EDIT: Noting that we have a redo NEO pipeline (geneontology/project-management#52) and some notes on redoing AmiGO, it might be worth it to spec out redoing NEO and Noctua autocomplete as a separate standalone project that could be almost a drop-in replacement, then use that to inform future AmiGO and GO API work (or feed it into the GO API first). |
Just to say, shouldn't gene product searches alsways be exact matches, exactly as the user typed the, (i.e no 'fuzziness') |
@ValWood Again, I'm not talking about what should be--I think we all agree on that00just clarifying the mechanics what is now for anybody diving into this. A fix can be applied either in the backend or frontend, with the immediate issue being around the mishandling of the dash in the identifier (which is essentially being treated like whitespace in this case). Special-case coding could likely be added to fix this edge case, but it might be worth weighting that against longer-term fixes and other fixes that are being queued up for Noctua. |
I can't find this complex in Noctua:
ComplexAc: CPX-566
Even though it has been in our GPI since 2024-05-04
Can you let us know of we are doing anything wrong in the file or is a Noctua loading issue?
I tried both "activity unit" and the "protein complex" entity annotons
The text was updated successfully, but these errors were encountered: