Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: ('C', 1, 'm') #22

Open
Wshengquan opened this issue Jul 15, 2024 · 26 comments
Open

KeyError: ('C', 1, 'm') #22

Wshengquan opened this issue Jul 15, 2024 · 26 comments

Comments

@Wshengquan
Copy link

hello,
I got an error when I used mathphaser:
/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass (name,) instead of name to silence this warning.
phased_df_chr.get_group(chromosome).iterrows()
/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import require
Traceback (most recent call last):
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1471, in
main(sys.argv[1:])
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 1437, in main
) = get_assignment_max(
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 898, in get_assignment_max
base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads
File "/share/home/yzwl_hanxs/anaconda3/envs/danbeixing/bin/methphasing", line 243, in get_base_modification_dictionary
for i in mm[methylation_identifier]: # Remora only output one type of score: c 1 m/c 0 m, but this part can be improved for other methlyation callers
KeyError: ('C', 1, 'm')
My run command is:meth_phaser_parallel -b sample.whatshap.haplotagged.bam -r ref.fa -g sample.phased.gtf -vc sample.haplotype.phased.VCF -o path/to/output
why this happened?
Any help to overcome this is appreciated!

Best

@Fu-Yilei
Copy link
Collaborator

Fu-Yilei commented Jul 15, 2024 via email

@Wshengquan
Copy link
Author

Thanks for your reply. Here's one of my reads:
8e9d05a0-9447-4b61-9600-1d9e92384702 0 1 1 60 1513S328M2I1M1I255M2D501M4D2M1I41M1I232M2D841M3S * 0 0 GGGAACTCCTCCTTTTTTTTTCCTAAGAAATTATCTAAATAATAAATTTTGTTGAAGTGTTTGTC
TAGGATTTCAAGGTAACGGCCTAGAAGTCACCATCTAACCACTGTGGCAAATCCGTATGAGTGACATTAGAAGAAGTAATCTTTTGTCTATCATTTTTAATTTTATTAAGTTGAATACATAATTTTACAATTTTATAGTCGCTAATGACTAATTGGTTAATTATCCAGCCAATGAAACTAGTAGAGGGAGTTTAGGAAATGCAGAATAATTAGGCATTTCCTAGGACTTGTGTAAATTAGAATATGACGGTAGCATTTTTATGGTTGTAAAATCAGAGGGAAGGCATATATATAGCTGTTGTAGTTTTTGTTTTTGTTTTTGTTTTTTTTTTAATGGCTACACCCACGGCATGTGGAAGTGTCCCTGGTTCCTGGTTTCTGGGCCAGGGATTGAATCCAAGCCACAGCTGCAGCAATACCAGATCCTTTAACCCACCGCACTGGGCTGGTGATTGAACCCTGACCTACACAGCAACCCAAGCTGCTGCAGTTGAAGTCTTAATCCGCTGTGTCGCAGCAGGAATTCCCATATAGCAGTTTTTAAAACAGTTAAAACTATCTATTTTGTCAAACAGTCATTTTGATGAGACATATTTTATGAATTTTTTCGTATGTAACTAAATATCTGATATTAATACTTTAGAGTTGTTTGAAAGTAATTTTTTGCTTTATATCATGTAACCTAGTAACTGAAGCCATTGCATATATATAAATGGCAGTACATTAATATTGTTTCTGAGAGCCCATTGGAAACAGGAGGTCACTTCACTATCTGTTGGTATGCTTGTAATCTGATTTATTAATTCCTTCTGGAGGTTGAAAAGGGTTATCAGTTGATTTGCCTAAAAAAATCATTAATAAAATTTACAGTTAAAGAAAATTTTTGTAACTTCTGTCTTCACTTGTTGGAATGTGTGTGAAAGAACACACAGATCTTGGTCCAGCTGCCCTAGTAGCCACAGTTTACTTGTCGGCTCCCTGTCCAAGAGTACACATGCCACATGGCCATGGAAGGGAATCCTGTATAGAAACAATGACAGGTATCTGGGTGGCTGTCATTGTCTAATTCTCCTGACAGAATCAGCAGTCTGAGTGACTCATTAACCGTTGTCTCTGACTGTTTCAGCGATTTGCTGCTGTAATCATGAGAATAAGAGAAACCCGCAGACACCGCACTAAATATCAGCTCTGGGAAAATGGTGTGCACGGGAGCCAAGAGGTGGGTCTAAAAGGGTTCATTGTCCTAGTCTGTGCGTTAGGAAAGAAAAGGCTGTATGTCGTGGTTTCTAGGTATTAGGTTACTACTGTGATACATTGTCATTGTTGCTTTCTGTTAAGGTGGTTGATTTTCATCACTGCAGACAGCAGTAACTTCATTTTCTTAAAATCAGGCATGAGTAAGGATGTGTGTTATCATCTGATTTCCATATAGTTGAGCGTGATTATGTGCTTAATTTTTGTCATTTCTCACCCCTGCTCTTGAGAGCTTTTGTTGATAATGTTGTTATTGCTTTCATTCTGCTTTTATTTTGTAAGCCCTGCACTCATTCATCGCTGTACCCGAATATGAGGTAAGGAGTGGTAAAGAAAGACTGGACATAAAAGAGGAATTAGCATTTGCACTCTTCAGATATAAATGCCATCAGTATTTTCCTATTAAAATGAAGCTTGTTTTCATCTCAGTGGAAATCTGTGGCTAAAGTACAACAATAGTAATGATAATGGTGAGGCTGTTGTACTTCACATCTATAAAATCTTGCATCAATAATTTGATTAACCAGATTCCTTTGGGTAGGCCTACGTTTTCTGTCAGAGACACAGGAATACTTTATAAATAAAATTGTTAATGTCTGTTGATCTTTTTTCATTGGAAGAGGGTGACCAGTTTACCTTTTGAAAAAAAACTTTCCTAATTTGGGCTTTTTTTTTTTTCTCCTTTTTAGGGCTGTACCCATGGCATATGAAAGTTCCTGTGCTAAGGGTTGATCAGAGCTGCAGCTGCCAGGTTACGCTACAGCAACACCAGATCAGGTTGTCTGTGGCCTTTGCCATAGCTTGGGGCAGCACCGGATCCTTAACCCACTGAGTGAGGCCAGGGATTGAACCTGCATCCTCCTGGATACTAGTTGGGTTCTTAACCTGCTGAGCCACAATGGGAACTCCTGGGCTTTTTATAAGTTATACGTTAAATAATTATTTTAGCTGTCTTTGAGTATGAATATCTCACTTTTTCTTTCCTTAGTGAAGAACAGTCCAGACTAGCAGCAAGAAAATATGCCAGAGTTGTACAGAAGTTGGGTTTTCCAGCTAAATTCTTGGACTTCAAGATTCAGAACATGGTGGGGAGCTGTGATGTGAAGTTCCCTATAAGGTTAGAAGGCCTTGTGCTTACCCACCAACAGTTCAGTAGGTAAGTCTGAAATGGATTGTGATTGCTTTTGGCAACAATTAATTTATAACCTATTTAAACACTGTTCATGATTTTTAAAAAACATGCAAAGTAATTGGTATATGAAATCAAATTATTTTGGTTTTTTCATCTTCAGGACCATAGCAGTGGCATATGGAAGTTTCCGGGCCAGGGGTCAAATCAGAGCTGCAACTGCCAACCTCCACCACGGCCACAGCAGTGCCAGGTCCCAGCTATGTCTGTGATTTACATCGCAGCTCAGGGCGAAACCAGATCCTTAACCCACTGAGCAGGGCCAGGGATCGAACCTGAATCCTCACTGATACAGTTTTGTTACCACTGAGCCACCATAGGAGCTCCCAAATGATTCACATATAGATGTTTTACTATTGAAATTTCTCCCATTCCTACCATTCTCACTGGTCTTCTACTTCTTAATGATACCCCAGACTCCCCTTTACTAGGATGAAATTGGTTCCCCTTCTGTATGTTTCATCGTTTCATTGCTCAATGAATACATTTCAAATGCATGAATTAGAAGTTCCTGTTATGGTGGATATGACATTTCTTTCTCTTTGTTTTCCCTGCTGCTCTCCTGGTGAAGACTTTAAAGGCAGGCTTGCCCATCTCTGCAGACCCTGCTTGCAGCATGCGCCTGGCGTGCTGTGCCCTTAATTGCACCAGGGCCTTTGCACATCCCGCTTTTATGCCTGGTGCTCTTGTTTCACTCTTCACCTAGGAAACTCCCACCGGCATTTCATGTATCCATCCAGGCATCTTTTCTTCAGAGAAGCCTGTCTTGAATTTTTCACTGGGCTAAACACACACACACCTAACTATTTTCAGTCTCTCTAACTAACAAAATGAGATGACTAGGATTAGAAAAAAACATACCAGTAGTGCATTTGGTCTGACAACACAGCGAAAGGTTCATTGTCTAACCAGTATCTTTTCTATCACTTTTGGTTAAGTCGCCTGACCTAGGTTAACACTTTGAGGATATTTCAGTTTAAGGAGATGAGATGTGAATGATTAGAAGGTAAACTGTGTGACTGTCATATCTTAGACATAAGTAATTCATTAGGCTCTTGTAAATCAGTGTACTTCACTTGTCCAGAGTGAAACTTGATGAGGGGCAGAGACTACAGAACATATTTATAAAGCTCTTGTTCTCATGTCTGGAAATTCAGATTCATTAGAAGTAAGAGTTCGTTGGCCTTCAGGTGCCAATTACAAGTCAGCTTGAG ISQKMJSFEB===:;AABFJECHEFHMHJNGQSQMJMMSHISJKMSCKSSSS9557JFJJEJFSFJLLGSSIHLIJHJSKIMSSSLKKSJPIILOSKLKIHNIHIKSQJSHSNOLIGNSLLJSKPJIIIO>LHA@BDNJLKIJKSMJSLLSSMMISMGJIILSOOKHSSLSRNSJHINMSSJSMIPLJSSJJSGJSKSIPIMSISISMKKIPOHKKHSIPJOHNNKKHKIMJLHMOJPJSKMIKQLHIHHSQIJHSNLKKHJNOSJJSMJSSJKKSLEOJIJHISSHPIJLMLSIIMSPQSJSIJSLIJSHJRKKJLIGE@88=;0///GNISJKSLFGISSISJJSNSKKHJKOJIF::9989ABCEKHEISLPSSSIHGKFJRGI@>;AEISSSSHGEESSSKOKIMSISKHJHHSSSKPSLGLISSJEHSQPLGLNKOSJIGECED@A;;:99:44459CEDBFJJHKJSGHFSLSJHSSSOSMJSMHSQSLGGMJJISNIIGLKPJSSSILKICEKIHJJSJHSKNSKKII<=<;8((((((///0@IFICB>;788956,&&&&&+,//3333336JSJKNKGGSDDDSINSMKHSKSSSLKJSJQKHG?<<=J;;/////;>.BHEHJLQJJNLGSMNIGKNSSJSISJSSISSQISHPMPSSINJSLSRSLJHISHSSISSKNSSISOSIHSISSISSQGJSSPRSHHNLJKKKSKPSSSIKCSFDEJGHOJIFMSJLJHSLISKSJSSISJSHKILHOLSPKHFFSLHSNSMKSKJJOSLSPS(((((GKKJOSLQHSSHJSISKSOMOKEH;;::;FSINPMLILSMOSSKRJRMSSMKIPSSLSISSSSGHGKJHMOJIMSHNHIHSONSSSMSSOKHSJFLNLSSOMSKHKKSSEEJKSSSSSSSSGJSKJSOSSMKSJMIFHSGSKSLSJNSKJSSHHMLIHRMMHSKHRGIJHSLLSSQHSLIISJB99:?GLEJESIPSMSSH>:;:;JRFOSLIOKFSKKKSSHOSJSIJS>==<:<9;;ACGMF@AEPKFHSKNSNERSSOJGKKSMSJHRKLHHNOHKKSSSPSJKKMIRJHKMLHSNSSSSSISSQS===>===MSQMNJNSPNSHGSGHGLLFGMLSFIILJNKSNRHMKKKKIIKKJKKJSLSJIIGLLIHD410,++)'&&'''%%%&&&&(((&&'(&'($$$%'(()3457@>.,,,,1&&&((-+++@BIRKSISIPSSMKHJISSJLE?@BBAGSKSJSPSKIPHSKSSSSNSSHHMSJGDCCJMSSSLNSSMLHKL
SMSJJMKSSKOMNSSHSSMMLSMJKSSISGSJOJOFIN>>==>SJSKSSKISSISSKNISSLHFSSSSRHGISIIGHGN@@CCBGOKGFHIKHKJISHRHJSSKSHSSKISIHRHKSKSGEJNMSJHSJNNSJNSGJHSJJSSHHSMOSSSHSSJSJHSPSLSGSKSNKNNGGHLGHSNPSJNPKJOSSGJNSSSKKHSLSSSLJSSILKSIGNSSSLISKSRHSILKDHIIPSSHSOSRSSMLMMOKLIHGNKQMJSOFEFLBFEPOFIKHLSHSKNGSSSECLHJJHLSSMSIKSSSSSSSSHHINMSJGEKIKSIJB><<<==:4433/00HSIIGEJHKJGSSSSJSSSISGLJSGIRLNILSMHHJGSLHIJSNINGSSMJJHHHJDDFHLNQIKLSKSMKKSSMOKMLJIMJIJSSHSMSOLSKSNSKSSSGOHIPSSHGISLSJKJSOLOJOGKNSMPKSSFSEQKJKISPJLOSSMHLJJSGIJC>@,,--.HGEGHGB80--,++(&'()((()BFADGR<<;;;<CECDMMSIJSSKSSKKSSHILSSQLSNA@AA@HHSNPIKLJSSSSHSKSLSFNKEGFMF:<20GJSFEHGKSKGNLSLSSLSLIJHQSISHMB76S:9989JSSSKFIJESKMSLHISHKSGS8GEIGSFSSGIEAA55=?@BJIKSQLRLGOHKHJIIHONHSSSNSLLJGSLKLGSSGME<;>100004100000EIELIJGCNIHMMHHMSSJHI?:98,,)((%%%$$%''&),-2367622658EIFCABB?8889ESNSKSLGFIJFDJFEGGJSLSSIQSHSFGHSIQSSJFDDCCESPSKMPSJILJKHLNSHJHMHKJIFHJ
FCDCDOPSIHOJJSMLJSSHKIIKIPGMIKSSM?>>?H;;<<<EFFGSQBCLIKFHLGSISISPJSRLSKSSQIHSSNJDMNLKSGHKFGISKSPSKILSKNSHSDHHSHA@@JSISSSGJOJRSQJHHKHSSISGLSHKSGISSJMSJGKNOSIISSNKSJSLKLIH@NRIENINESIJLJKHSNISSHSISEBFEFNSHMHSLOKHLJSOSIIIS
JSPIJIMSLSKKSILSPSQHKMSHMSKGKLKLIKKSSHHHIKGILIEQMLOSHSNGHFFHHPSDRKEMHLKSSSSSHMRSJGSSSJHQMSSMMSSJJHOKSLKJPSIQKFIJMJLSMKNSLLSIOSKSJSSSJJCF>;;@AASSMPSSJSSNNNSSSKHSHKSJSJRHPHLSKKHKHFCODJHHEGECB)&&&&*((112GOKIKHFGSSLHHKSSNISLI@MJSSFHFLSSLNLMJJSSJNS??>??SSKSQHIKLPSIGJSJBA642
222>??AFKFHSIILGQSHILSMLIJKJILJJOSHSSKIHHRGJSPSSFSGIJSL;:981023356AADCCCEHSHHOSJHQISLPKMFHH:4554448=@GNSNJMILHSKJKKMSMSNKIJISKJSKS<<1111364311BEEESNSSJSMLLSNHILSLJSJNJKJMJJHKOSLHIIFFEHEFLLSNG;40()*3767@ADCSSHSGSSPSHMSQJOIAFC?>>>>A33333;@>2?:9:69<?<GH56DBBDEGSLGPJSMSFSHRSJGSIGMSPSSIOJIIGSKSNKSKSHPSKLSSSSJSHIJSS?D-----SSNKLHMSFSILSHOSLSKIIJIIKSSSJISIHKLLSJISKPQISKKSLGAISKGFEB7D@?
??KIGHFAAJLSJJGSMFSPFMISKSJSMLIJKFKJSSSILMMHIGGSPSKKHSSRSJSLNKPOSLNSJONQIISRLISKNJHQIJGQSLIISNSHPSHSMSJKSGSOMRSOISSJSLSJGLKHSLSHIKSMSJSLSGKHJRKKGJKFKHIILJIB@90/42//1932222344444:;<CJFKSILLMFJSHIEE==CCSSJLHSGHIJSJLSJSSISSSSNJNGSKGIFSJSSIKPLKJSSPSJSHRSJISSJSSJSHSKIHMSSSQJHJSSGGSSHSKSRIIJKSJKME))*477>BBCJGSSPHKLSLKJPIKSSSSOHMSHKSGSISOSSSSPSGILFHIGEJ=<;;;KSPHSGNLSKGFGFIHSSJJSJMHRSSLMPKKKOSJHNHOMLSQNLLGKQSSSJFSHSISFNSQSIHJHSSQILNSSIHJISSIHFJHSISGNFGSLGKNOIKHQJSHSQSHRIISJNSGHSIGNJSSKLSSOJGSSLJOSLSISPHJHHJMIIKJHSMSLLNSJKSPHJSSSRISLKJSSHLMSJKSJILHJKSIJSFGDDCDBBCSIKSFFGDACHSSMJSSSKLSRKFKIJHGFDPQESSJHISSSLKJJIPKJJGSSJSGKHGF>=DJISSSSIJSKSKJQISMSH=;:98888
755.)&% qs:i:21 du:f:9.5696 ns:i:47848 ts:i:592 mx:i:3 ch:i:843 st:Z:2024-04-08T11:56:24.693+00:00 rn:i:5741 fn:Z:PAW37422_pass_b74e10a2_68ebe454_253.pod5 sm:f:100.808 sd:f
:28.356 sv:Z:quantile dx:i:0 RG:Z:68ebe454a63ea33cf655142d882767dfb3012d4a_dna_r10.4.1_e8.2_400bps_sup@v4.2.0 MN:i:3722 MM:Z:A+a.,0,17,2,6,3,0,0,3,12,0,12,2,13,1,9,5,0,3,12,8,0,0,0,0,0,0,0,0,2,7,0,0,0,2,0,0,3,0,0,0,2,9,63,0,0,0,3,0,0,47,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,1,0,0,6,2,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,4,0,3,1,1,0,17,0,0,0,50,0,1,23,39,0,0,43,1,3,0,0,0,0,4,0,0,0,0,0,0,0,0,4,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,12,1,14,0,0,0,3,4,0,0,0,0,0,0,0,3,2,0,3,6,0,39,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,2,1,0,0,1,0,0,2,2,8,0,17,4,0,0,0,12,14,0,1,0,0,0,4,0,0,0,1,0,0,0,1,2,5,0,0,1,33,12,0,0,12,0,1,41,1,0,6,1,1,4,8,0,0,0,0,0,0,0,0,0,0,0;C+h.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0;C+m.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0;
ML:B:C,14,13,53,12,35,21,16,14,112,191,38,66,13,12,12,39,111,24,13,22,80,72,66,148,40,35,28,14,28,35,40,29,16,13,26,251,29,241,106,70,12,171,107,16,21,12,12,16,18,12,12,17,15,12,17,22,13,14,22,25,46,18,71,17,24,17,23,17,48,20,25,15,20,15,78,20,139,100,80,74,146,146,112,125,167,117,108,87,25,51,12,38,46,12,15,13,12,111,18,21,20,19,14,12,31,26,30,41,59,39,17,12,13,56,77,45,31,14,53,29,222,132,113,74,21,97,14,14,18,17,22,25,34,16,68,14,210,38,14,25,16,159,12,13,102,13,15,17,18,34,126,38,22,33,25,19,39,93,47,76,15,21,15,39,14,15,14,32,21,35,43,62,49,51,64,36,36,15,21,62,45,255,255,29,42,38,40,17,227,32,37,29,15,226,91,21,156,42,85,112,128,14,132,14,88,206,12,14,13,16,15,15,40,38,53,51,53,63,30,26,87,12,159,14,14,13,90,21,12,14,24,63,22,13,14,12,20,112,133,55,43,108,31,138,13,16,17,59,29,182,18,15,8,0,3,8,3,1,1,3,4,2,5,5,16,8,2,9,28,40,33,25,33,18,30,125,7,51,6,2,2,2,2,1,4,5,4,4,3,2,4,8,15,100,9,8,12,12,147,19,13,30,6,7,5,2,9,16,15,12,9,6,5,4,2,3,3,11,5,6,6,3,14,255,30,10,7,22,5,8,11,6,6,4,6,8,10,6,7,6,5,5,5,3,4,0,0,7,11,9,7,0,5,2,29,27,71,47,3,7,2,2,1,37,78,28,7,8,17,18,1,11,9,4,11,7,9,5,2,1,2,3,3,2,2,2,6,3,44,9,23,39,1,3,9,7,18,6,7,8,7,4,21,8,13,7,6,6,4,47,6,7,5,9,15,19,34,22,11,11,0,5,4,6,13,130,57,230,22,11,11,21,9,12,21,27,86,5,5,7,8,10,10,14,16,5,7,5,4,7,7,9,7,8,3,3,4,11,5,7,1,24,5,4,8,7,7,9,49,7,6,6,4,3,4,2,8,2,5,5,8,11,12,3,10,6,17,11,2,12,3,2,2,5,5,2,6,4,20,24,9,16,13,10,11,13,12,14,4,12,4,2,2,25,28,3,4,6,3,4,2,6,11,10,13,11,10,10,3,3,7,7,3,4,2,0,1,3,12,1,3,2,15,6,4,10,13,8,14,11,9,4,4,9,15,14,8,10,7,3,4,4,5,20,15,16,4,18,1,8,8,11,12,5,4,6,3,3,20,8,7,8,13,4,3,11,7,3,5,13,66,13,2,3,3,7,7,9,12,4,3,1,3,2,106,92,5,5,17,10,2,39,21,8,10,9,5,63,7,16,255,13,24,12,37,254,252,22,12,13,13,240,13,254,16,36,40,40,39,52,50,24,24,20,79,19,13,14,12,12,254,16,12,13,14,15,13,14,22,9,7,18,17,14,240,36,235,17,225,12,16,18,12,22,38,16,22,17,13,16,16,13,17,16,22,22,18,28,12,14,0,6,12,23,46,18,22,20,18,19,12,21,16,18,15,20,31,13,18,15,13,18,13,255,16,17,18,200,255,19,18,43,65,137,90,25,37,253,12,254,55,38,37,18,12,15,26,254,244,12,12,21,17,20,14,42,254,12,14,14,14,12,12,14,12,27,246,13,11,254,12,15,19,100,18,16,247,22,13,26,12,13,14,13,12,13,14,16,17,13,14,27,36,42,44,29,33,255,14,25,232,85,94,110,8,70,26,29,59,31,4,234,51,44,18,18,21,28,27,28,26,27,17,20,15,14,17,19,17,14,20,13,15,15,24,14,16,254,12,15,12,16,18,16,19,50,13,22,21,15,13,14,12,12,45,18,16,20,27,13,252,20,19,41,29,14,12,14,13,16,21,21,12,24,19,224,55,35,26,30,34,26,31,24,28,13,212,251,22,31,229,76,246,28,23,19,20,12,24,36,23,23,244,23,25,17,17,21,20,18,16,13,17,14,16,47,254,14,16,11,12,12,21,24,16,24,26,23,16,15,58,26,23,20,28,26,16,14,13,17,31,29,29,251,79,254,21,16,20,28,20,19,22,23,28,11,12,248,15,25,12,12,12,21,14,20,14,189,10,14,13,24,28,14,16,21,15,13,254,22,253,15,2,13,16,30,19,12,29,234,16,24,29,22,22,30 NM:i:25 ms:i:4282 AS:i:4276 nn:i:0 de:f:0.00860507 tp:A:P cm:i:350 s1:i:1968 s2:i:0 MD:Z:169G154G1G154T72C23C0A4^TA501^TTGG3T77A55A38A98^TG119T721 rl:i:400 SA:Z:1,5438,-,2207S1514M1I,60,15;
Is there an error because the label of methylation recorded in my bam file is MM instead of mm

@Fu-Yilei
Copy link
Collaborator

I see. The read you are showing here only has 6mA calls, but MethPhaser can only phase with 5mC calls.

@Fu-Yilei
Copy link
Collaborator

MM:Z:A+a? is for 6mA calls. C+m? is for 5mC.

@Wshengquan
Copy link
Author

MM:Z:A+a.,0,17,2,6,3,0,0,3,12,0,12,2,13,1,9,5,0,3,12,8,0,0,0,0,0,0,0,0,2,7,0,0,0,2,0,0,3,0,0,0,2,9,63,0,0,0,3,0,0,47,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,0,0,0,1,0,0,6,2,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,4,0,3,1,1,0,17,0,0,0,50,0,1,23,39,0,0,43,1,3,0,0,0,0,4,0,0,0,0,0,0,0,0,4,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,12,1,14,0,0,0,3,4,0,0,0,0,0,0,0,3,2,0,3,6,0,39,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,2,1,0,0,1,0,0,2,2,8,0,17,4,0,0,0,12,14,0,1,0,0,0,4,0,0,0,1,0,0,0,1,2,5,0,0,1,33,12,0,0,12,0,1,41,1,0,6,1,1,4,8,0,0,0,0,0,0,0,0,0,0,0;C+h.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0;C+m.,9,0,2,3,0,2,0,6,3,2,2,0,1,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,5,3,1,0,0,0,0,0,3,0,0,0,0,0,1,1,0,0,1,11,14,0,0,1,0,0,0,13,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,4,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,3,0,0,0,0,0,0,0,0,1,0,2,2,0,0,0,5,6,4,0,0,0,0,0,0,8,1,0,1,0,0,16,4,6,0,1,0,1,0,0,1,2,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,8,0,0,5,2,3,0,0,4,1,0,0,0,0,0,1,12,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,22,0,8,3,0,0,3,1,0,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,1,0,0,1,2,2,8,16,0,0,0,0,0,0,7,1,1,4,9,2,0,0,0,8,1,0,0,0,0,0;
MM has not only A+a but also C+m

@Fu-Yilei
Copy link
Collaborator

Fu-Yilei commented Jul 16, 2024

I see, yes indeed this is a bug of MethPhaser. I would suspect that there are some reads that only have 6mA calls but do not have 5mC calls. I will try to apply a filter to skip those reads.

@Fu-Yilei
Copy link
Collaborator

Fu-Yilei commented Jul 16, 2024

I've made the changes, could you please check the MethPhaser version of (https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1)? Or if you could provide an example BAM file I can do the test.

@Wshengquan
Copy link
Author

subset.zip
This is a subset of my data.
Do I need to reinstall mathphaser

@Fu-Yilei
Copy link
Collaborator

yeah you need to clone the version in the branch I provided

@Wshengquan
Copy link
Author

Thank you very much for your help, waiting for your test success using the subset file

@Wshengquan
Copy link
Author

I used the new version you changed yesterday, but the error is still reported, but it seems to be different:
/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas
phased_df_chr.get_group(chromosome).iterrows()
/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import require
Traceback (most recent call last):
File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 1477, in
main(sys.argv[1:])
File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 1443, in main
) = get_assignment_max(
^^^^^^^^^^^^^^^^^^^
File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 910, in get_assignment_max
assignment_df = get_base_modification_list_snp_block(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 605, in get_base_modification_list_snp_block
for i in mm[methylation_identifier]:
~~^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: ('C', 1, 'm')

@Fu-Yilei
Copy link
Collaborator

sorry forgot to change one spot in the code, now should be fine :)

@Wshengquan
Copy link
Author

I tried again, but I got a mistake:
/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/meth_phaser_parallel:235: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass (name,) instead of name to silence this warning.
phased_df_chr.get_group(chromosome).iterrows()
/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import require
Traceback (most recent call last):
File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 1471, in
main(sys.argv[1:])
File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 1437, in main
) = get_assignment_max(
^^^^^^^^^^^^^^^^^^^
File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 898, in get_assignment_max
base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/share/home/yzwl_hanxs/app/dorado-0.7.2-linux-x64/bin/methphasing", line 243, in get_base_modification_dictionary
for i in mm[methylation_identifier]: # Remora only output one type of score: c 1 m/c 0 m, but this part can be improved for other methlyation callers
~~^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: ('C', 1, 'm')

@Fu-Yilei
Copy link
Collaborator

Fu-Yilei commented Jul 18, 2024

Sorry about the back and forth. Was this the program from this patch? https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1 Based on the line number of the bug, it seems like you are using the main branch. Or you can send me the input you are using for this program, it is hard to debug with only subsampled reads.

@Wshengquan
Copy link
Author

I did reinstall it from here : https://github.com/treangenlab/methphaser/tree/Fu-Yilei-patch-1
My input file takes up too much memory and cannot be uploaded to you via github, even after compression. How should I give you my input file

@Fu-Yilei
Copy link
Collaborator

you could limit your vcf, gtf and bam to the first 3 phaseblock on chr1. I think those could be sufficient. Thanks.

@Wshengquan
Copy link
Author

“you could limit your vcf, gtf and bam to the first 3 phaseblock on chr1. I think those could be sufficient.”
I'm sorry, how do I do this? This is my first contact with haplotype related knowledge, I really do not know how to do

@Fu-Yilei
Copy link
Collaborator

Hey no worries!
For gtf file, keep only first 3 lines.
For bam file, use samtools with samtools view function with specifying region chr1:0-x (x = the phaseblock end of the 3rd phaseblock)
for vcf file, use vcftools view chr1:0-x too.

I have put the samtools and vcftools repo here: https://samtools.github.io/bcftools/bcftools.html#view https://www.htslib.org/doc/samtools-view.html

@Wshengquan
Copy link
Author

Thank you very much for your help
train.gz
I made specific operations in the files according to the suggestions you gave me but I kept the first chromosome in the gtf file.
The reference genome I used was this version of the pig genome downloaded from ensemble:
%)AP1W2UV@FY~H1SD$Y42V4

@Fu-Yilei
Copy link
Collaborator

Hey sorry for the delay, but I still need a week or so to actually look into this issue. I would suspect that there are some reads only have 6ma or 5hmc but does not have 5mc on it so the bug exists. I don't have a huge amount of time to debug this right now but will look into it next week. Thanks!

@DHmeduni
Copy link

DHmeduni commented Aug 5, 2024

Question in this vane...can Methphaser process C+h tag info, or would this also cause a hang-up or poor phasing?

@Fu-Yilei
Copy link
Collaborator

Fu-Yilei commented Aug 5, 2024

MethPhaser ignores C+h because it is not very accurate in some basecaller versions. As long as you have C+m MethPhaser can do phasing.

@DHmeduni
Copy link

DHmeduni commented Aug 6, 2024 via email

@Fu-Yilei
Copy link
Collaborator

Fu-Yilei commented Aug 6, 2024

Sorry this depends. First the landscape of human genome methylation is still not fully revealed. This would require a large population scale analysis to discover so I cannot tell you which region usually has more heterozygosity and which region does not. On the other hand, the input sample type also affects a lot. For example we included a blood sample which has shitty ONT reads in our paper, and it shows that the improvement is not huge because the SNP phasing is already shitty. Happy to chat more if you like.

I think you can understand this as: when you have a reasonable SNP phased genome and the gap is not too large, MethPhaser can come in and help

@DHmeduni
Copy link

DHmeduni commented Aug 7, 2024

Hi,
Sure I have a bunch of questions, would love to learn more about the program and maybe some more insights you have.

@Wshengquan
Copy link
Author

Long time no see. I have been testing other processes recently, so I am very sorry for not communicating with you in time.May I ask whether this process has been successfully run out with the data I gave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants