-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about GFA format #10
Comments
Hi @rob-p , I understand you confusion. The issue is that initially we adopted the edge-centric definition of the graph, i.e. sequences are spelled by edges, with nodes of size Again, sorry for the confusion, I am aware that it pops up all the time (https://www.biostars.org/p/175058/). I have plans to improve documentation to clear things out (I even put it in for 0.9.3: https://github.com/medvedevgroup/TwoPaCo/blob/master/NEWS.md). I just didn't expect people to start using TwoPaCo right away :) |
Hi @IlyaMinkin, Yup, I understand the confusion here as well. We have often gone back and forth between preferring the node and edge-centric view of the dBG. I guess my concern with the proposed temporary solution (running with Thanks for the quick responses! |
@rob-p I was afraid the odd/even issue was going to pop-up. I will think about it and try to make a fix soon. |
Hi @IlyaMinkin,
It's me again :). TwoPaCo has been working great, but I've run into a small issue regarding the GFA file. I was wondering if you could clear up my confusion. I build a cdBG using TwoPaCo with k=31. As the document states that k is the node size, I'm expecting the cdBG to contain a list of segments (i.e., contigs) that overlap by k-1. However, in the resulting GFA file, all of the contigs seem to instead overlap by k (i.e., they show a
31M
overlap). This is causing some issues downstream, as we expect the invariant that a k-mer (or its reverse complement) appears at most once in the cdBG. However, when the overlap is of size k, we get that a given k-mer may appear as many times as it participates in an overlap.Have I misunderstood something about the expected format of this graph? Is there an easy way to obtain the cdBG GFA file such that the overlaps are retained as k-1 bases instead of k?
Thanks!
Rob
The text was updated successfully, but these errors were encountered: