-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError: ('C', 1, 'm') #22
Comments
Could you please attach one read from your bam file? I am suspecting there are some bugs with the methylation signal reading. Thanks!
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Wshengquan ***@***.***>
Sent: Monday, July 15, 2024 8:44:03 AM
To: treangenlab/methphaser ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [treangenlab/methphaser] KeyError: ('C', 1, 'm') (Issue #22)
hello,
I got an error when I used mathphaser:
/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass (name,) instead of name to silence this warning.
phased_df_chr.get_group(chromosome).iterrows()
/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import require
Traceback (most recent call last):
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1471, in
main(sys.argv[1:])
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1437, in main
) = get_assignment_max(
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 898, in get_assignment_max
base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 243, in get_base_modification_dictionary
for i in mm[methylation_identifier]: # Remora only output one type of score: c 1 m/c 0 m, but this part can be improved for other methlyation callers
KeyError: ('C', 1, 'm')
My run command is:meth_phaser_parallel -b sample.whatshap.haplotagged.bam -r ref.fa -g sample.phased.gtf -vc sample.haplotype.phased.VCF -o path/to/output
why this happened?
Any help to overcome this is appreciated!
Best
—
Reply to this email directly, view it on GitHub<#22>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADDV4HSJUIW34QP42SJNL7DZMPG2HAVCNFSM6AAAAABK4SWL4WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYDQNZYGU2TQOA>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Thanks for your reply. Here's one of my reads: |
I see. The read you are showing here only has 6mA calls, but MethPhaser can only phase with 5mC calls. |
MM:Z:A+a? is for 6mA calls. C+m? is for 5mC. |
MM:Z:A+a.,0,17,2,6,3,0,0,3,12,0,12,2,13,1,9,5,0,3,12,8,0,0,0,0,0,0,0,0,2,7,0,0,0,2,0,0,3,0,0,0,2,9,63,0,0,0,3,0,0,47,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,1,0,0,6,2,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,4,0,3,1,1,0,17,0,0,0,50,0,1,23,39,0,0,43,1,3,0,0,0,0,4,0,0,0,0,0,0,0,0,4,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,12,1,14,0,0,0,3,4,0,0,0,0,0,0,0,3,2,0,3,6,0,39,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,2,1,0,0,1,0,0,2,2,8,0,17,4,0,0,0,12,14,0,1,0,0,0,4,0,0,0,1,0,0,0,1,2,5,0,0,1,33,12,0,0,12,0,1,41,1,0,6,1,1,4,8,0,0,0,0,0,0,0,0,0,0,0;C+h.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0;C+m.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0; |
I see, yes indeed this is a bug of MethPhaser. I would suspect that there are some reads that only have 6mA calls but do not have 5mC calls. I will try to apply a filter to skip those reads. |
I've made the changes, could you please check the MethPhaser version of (https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1)? Or if you could provide an example BAM file I can do the test. |
subset.zip |
yeah you need to clone the version in the branch I provided |
Thank you very much for your help, waiting for your test success using the subset file |
I used the new version you changed yesterday, but the error is still reported, but it seems to be different: |
sorry forgot to change one spot in the code, now should be fine :) |
I tried again, but I got a mistake: |
Sorry about the back and forth. Was this the program from this patch? https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1 Based on the line number of the bug, it seems like you are using the main branch. Or you can send me the input you are using for this program, it is hard to debug with only subsampled reads. |
I did reinstall it from here : https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1 |
you could limit your vcf, gtf and bam to the first 3 phaseblock on chr1. I think those could be sufficient. Thanks. |
“you could limit your vcf, gtf and bam to the first 3 phaseblock on chr1. I think those could be sufficient.” |
Hey no worries! I have put the samtools and vcftools repo here: https://samtools.github.io/bcftools/bcftools.html#view https://www.htslib.org/doc/samtools-view.html |
Thank you very much for your help |
Hey sorry for the delay, but I still need a week or so to actually look into this issue. I would suspect that there are some reads only have 6ma or 5hmc but does not have 5mc on it so the bug exists. I don't have a huge amount of time to debug this right now but will look into it next week. Thanks! |
Question in this vane...can Methphaser process C+h tag info, or would this also cause a hang-up or poor phasing? |
MethPhaser ignores C+h because it is not very accurate in some basecaller versions. As long as you have C+m MethPhaser can do phasing. |
Hi,
As long as I have your attention, I've noticed that Methphaser works better
and worse depending on some use cases. I'm assuminng this most likely has
to do with the underlying heterozygosity in methylation between the
alleles. Do you have any information on which regions here work better or
worse?
Best regards,
Dvid
…On Mon, 5 Aug 2024 at 18:04, Yilei Fu ***@***.***> wrote:
MethPhaser ignores C+h because it is not very accurate in some basecaller
versions. As long as you have C+m MethPhaser can do phasing.
—
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A7KF4B3LDRMDNUN62O7V333ZP6PCPAVCNFSM6AAAAABK4SWL4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGQYTSMJVHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Sorry this depends. First the landscape of human genome methylation is still not fully revealed. This would require a large population scale analysis to discover so I cannot tell you which region usually has more heterozygosity and which region does not. On the other hand, the input sample type also affects a lot. For example we included a blood sample which has shitty ONT reads in our paper, and it shows that the improvement is not huge because the SNP phasing is already shitty. Happy to chat more if you like. I think you can understand this as: when you have a reasonable SNP phased genome and the gap is not too large, MethPhaser can come in and help |
Hi, |
Long time no see. I have been testing other processes recently, so I am very sorry for not communicating with you in time.May I ask whether this process has been successfully run out with the data I gave |
hello,
I got an error when I used mathphaser:
/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass
(name,)
instead ofname
to silence this warning.phased_df_chr.get_group(chromosome).iterrows()
/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import require
Traceback (most recent call last):
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1471, in
main(sys.argv[1:])
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1437, in main
) = get_assignment_max(
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 898, in get_assignment_max
base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 243, in get_base_modification_dictionary
for i in mm[methylation_identifier]: # Remora only output one type of score: c 1 m/c 0 m, but this part can be improved for other methlyation callers
KeyError: ('C', 1, 'm')
My run command is:meth_phaser_parallel -b sample.whatshap.haplotagged.bam -r ref.fa -g sample.phased.gtf -vc sample.haplotype.phased.VCF -o path/to/output
why this happened?
Any help to overcome this is appreciated!
Best
The text was updated successfully, but these errors were encountered: