Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing Alignment Object #71

Open
jpearl01 opened this issue Oct 5, 2018 · 3 comments
Open

Parsing Alignment Object #71

jpearl01 opened this issue Oct 5, 2018 · 3 comments
Assignees

Comments

@jpearl01
Copy link

jpearl01 commented Oct 5, 2018

First, thanks for implementing this, it has been very handy for me. I was wondering if there were methods available to iterate through an alignment object for each residue position and specifically look for differences between the query and target sequences. The way the alignment object looks to be structured, I can get access to the individual query and target sequences, but it looks like the only way to actually get the alignment is to parse the cigar string, and recreate the alignment from that? Is there a way to easily do that? My google foo is failing me here, but maybe you can point me in the right direction?

Thanks in advance!

@danmaclean
Copy link
Collaborator

Hi @jpearl01

Looks like we never implemented this. It is kind of complicated, but I can see why you'd want to do it.

I found this discussion on how it might be done https://www.biostars.org/p/112382/

This reference to a tool that does it https://www.biostars.org/p/110498/

and this repo for the tool, https://github.com/mlafave/sam2pairwise

Hope this is helpful. I don't think any of us have much time to implement this quickly (like even in the next couple of months ) but it seems like something we should think about.

Thoughts @homonecloco ?

@homonecloco
Copy link
Collaborator

Hi @jpearl01 ,
As @danmaclean , we haven't implemented a functionality like this, but I'd been messing a bit with CIGAR lines in other projects, so I may be able to get something on the library, but I can't promise a timeline. However, what do you think would be more useful? The easiest would be to return an array with two strings, or a SequenceHash from bioruby, but that would incur some overhead.

@jpearl01
Copy link
Author

Whoops, sorry for the delay. For our particular project just having multiple sequence alignments ended up working fine for us, so we ended up not pulling the alignments out of BAM, but I'm still very interested in having that kind of functionality. Personally I'd be fine just having a function that would return a normal array(s) - at that point if we wanted to pull it into a bioruby sequence object it would be relatively trivial. I'm not sure if that keeps with the philosophy of having a bioruby related package (i.e. would people want to stay within the ecosystem and expect a bioruby object?) but I would be totally fine with normal arrays, and we wouldn't need any further processing to do our specific analysis.

sam2pairwise is actually very close to what I was thinking about... Thanks for the links and comments! Will keep an eye on this.

@homonecloco homonecloco self-assigned this Nov 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants