-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processing annotation contributors for multiple GPAD lines with single annotation id #94
Comments
@vanaukenk I might phrase it as: For the purpose of comparison, dates without further time information are assumed to be midnight (should probably be in spec). A "set" here is defined as a group of GPAD annotation lines that all share the same col 12 Rules:
Given these rules, the output model for a set should:
|
Thanks, @kltm I think I understand everything except this one: 'It is an error for a modification-date of one line in a set to be between the creation-date and modification-date of another line. (Strict ordering.)' Also, if we get a set of annotations with a given id that all have the same YYYY-MM-DD without additional time information, i.e. YYYY-MM-DDTHH:MM, we can't do anything with that as we'd have no way of knowing what the last modification was, right? We might still be able to concatenate contributors, but we couldn't know what the 'final' annotation was for that set. |
@vanaukenk Okay, yeah, let me explain that a little and make a clarification (with edits inline): 'It is an error for While a hard case to hit, it is possible in the defined format. The purpose here is to require "sortability"--every line in a set occurred before or after every other line in a set. We want to have this property because we want to be able to have the ability recreate a history of operations and if an action occurs at the same time as another action in a set, or is otherwise inconsistent, we have an ambiguity and cannot consistently figure out what happened. So, if we have lines in a set that are: Does this make sense? I definitely could have written the formulation more clearly. For your second question (dates are not granular enough to distinguish between them), I think it's a pretty big problem that should be regarded as an error, at least initially. |
@kltm Thanks, that makes sense now. Let's just review on tomorrow's MOD imports call, as I think we're clear now on what to do wrt both annotation sources and GO. |
For the SGD annotations, we decided to drop nearly all of the additional information and are just adding any valid contributor ID to the comments |
From 2021-06-08 MOD imports meeting:
We discussed how we want to handle processing of multiple GPAD lines for a single annotation id. This will be the situation with GPAD files coming from Protein2GO (and possibly other sources) where tools tracked and displayed annotation history.
The goals here are to ingest annotations in as uniform a way as possible so that the near-term display of 'annotations' in Noctua will be similar for each onboarding group and the long-term prospect of dealing with 'annotation' history is at least starting from standardized data with clear, documented semantics.
Here's the proposed workflow:
For each line in an incoming GPAD file, the import code will check the annotation id in the Annotation Properties field.
For annotation ids that appear only once, no additonal processing will be needed.
For annotation ids that appear more than once, the import code will look at the contributor-id field of each annotation with the same id and if there is more than one contributor id, will add all contributors to the Annotation Properties field in the last-edited annotation (assessed by the latest timestamp in the Date field) and then import the last-edited annotation with one or more contributor-ids.
@dustine32 @kltm @sierra-moxon - please feel free to add more or edit for clarity
I will add information about the semantics of the Date, creation-date, and modification-date to the imports SOP doc as well as to the GPAD/GPI 2.0 specs.
The text was updated successfully, but these errors were encountered: