-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of soft-clipping and CIGAR computation #25
base: develop
Are you sure you want to change the base?
Conversation
I looked at the four commits and all the changes look great! I tested the soft-clipping with a couple of small examples and I found a case with the
For these targets, when running the command Also in the case of running without the I have some fixes which I can commit, but we can also have a discussion before going forward. |
1868d67
to
d7f38a9
Compare
src/PufferfishAligner.cpp
Outdated
@@ -153,6 +154,8 @@ void processReadsPair(paired_parser *parser, | |||
// aconf.allowOverhangSoftclip = mopts->allowOverhangSoftclip; | |||
// aconf.allowSoftclip = mopts->allowSoftclip; | |||
aconf.computeCIGAR = (mopts->computeCIGAR and !mopts->noOutput); | |||
aconf.endBonus = mopts->endBonus; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to have endBonus
in the aconf struct anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed it.
src/PufferfishAligner.cpp
Outdated
@@ -634,6 +638,8 @@ void processReadsSingle(single_parser *parser, | |||
// aconf.allowOverhangSoftclip = mopts->allowOverhangSoftclip; | |||
// aconf.allowSoftclip = mopts->allowSoftclip; | |||
aconf.computeCIGAR = (mopts->computeCIGAR and !mopts->noOutput); | |||
aconf.endBonus = mopts->endBonus; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we might need to have endBonus
in the aconf struct anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed it from here as well.
Based on my review and the conversations we had about the changes before, everything looks good to me. |
…d overhang cases happening together
85c702c
to
cf54093
Compare
Here is a short summary of the changes:
end_bonus
value from 10 to 5KSW_EZ_EXTZ_ONLY
flag for KSW2 extension alignments (otherwise it always backtrack from the end of both sequences)CIGARGenerator
's variable type to enable computation of CIGAR for cases with overlapping MEMs (uint32_t
toint32_t
)--allowSoftclip
and--allowOverhangSoftclip
with--computeCIGAR
--debug
; if not used, no debug information will be printedWith these changes I compared the output of PuffAligner with Minimap2 for single-end reads. This comparison showed that generally, the alignment is computed correctly in most of the cases.
The next essential step: In some cases the alignment is wrong. The reason is that the extension is stopped (ez->stopped==1) and therefore, the score for reaching the end of query is not calculated properly. As a result, it might be the case that score for reaching end of query is incorrectly higher than the maximum scoring prefix alignment. Here is an example:
It seems that extension stopping was not written with soft-clipping and computation of CIGAR strings in mind, so this requires a fix.
I will check if the original KSW2 extension fixes this issue and if it does, we can discuss about whether it's worth to have the stopping feature or not.
So next items:
end_bonus
value