Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string comparison in build_sequence_object #21

Open
levinas opened this issue Nov 21, 2014 · 2 comments
Open

string comparison in build_sequence_object #21

levinas opened this issue Nov 21, 2014 · 2 comments

Comments

@levinas
Copy link

levinas commented Nov 21, 2014

Is this line trying to sort sequences before computing an MD5? Should the operator be 'cmp' ?

https://github.com/kbase/KBaseFBAModeling/blob/master/lib/Bio/KBase/fbaModelServices/Impl.pm#L2457

@mmundy42
Copy link
Contributor

To me, it looks like the operator should be 'cmp' instead of ‘<=>’ since the sequence field of the contigobject is a string and not a number.

I’m not sure the impact of changing the operator. The _build_sequence_object() method is building either a ContigSet or ProteinSet object from a user provided fasta file of DNA sequences or protein sequences. The MD5 of all of the sequences is used to generate an ID for the new object using the register_ids() method from the ID server. It does not seem like that should be an issue since the object IDs are just a number.

The MD5 is also saved in the object — maybe that is used by the workspace service’s versioning support to understand when an object has changed and a new version needs to be created? I suppose there is a chance that a MD5 generated with the updated sort order could match a MD5 generated with the current sort order and cause the same ID to be returned and the workspace service to create a new version of an existing object instead of a new object.

I’m not seeing any impacts from the array of sequences being in a different order but maybe I’m missing something subtle.

Also, the cmp operator is used when doing the same processing on line 816 for building a ContigSet for a SEED genome and line 1000 for RAST genome.

Mike Mundy

From: Fangfang Xia <[email protected]mailto:[email protected]>
Reply-To: kbase/KBaseFBAModeling <[email protected]mailto:[email protected]>
Date: Friday, November 21, 2014 at 5:10 PM
To: kbase/KBaseFBAModeling <[email protected]mailto:[email protected]>
Subject: [KBaseFBAModeling] string comparison in build_sequence_object (#21)

Is this line trying to sort sequences before computing an MD5? Should the operator be 'cmp' ?

https://github.com/kbase/KBaseFBAModeling/blob/master/lib/Bio/KBase/fbaModelServices/Impl.pm#L2457


Reply to this email directly or view it on GitHubhttps://github.com//issues/21.

@levinas
Copy link
Author

levinas commented Nov 24, 2014

I agree. I think this should be "cmp" and changing this is unlikely to break anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants