You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My use case is lifting over gnomAD v4.1 from GRCh38 to T2T-CHM13v2.0 and sometimes multiple GRCh38 variants resolve to the same T2T coordinate - I want to be able to process these duplicates (say picking highest or lowest AF) rather than just taking the first in the file
Control how selecting duplicates works
It would be really useful to be able to choose which one to take. You could do this by defining how to sort the dupes then taking the 1st, for instance take the one with the highest AF, then highest AC with --rm-dup-sort=-AF,-AC or --rm-dup--sort=AF:desc,AC:desc
Merge functionality with duplicates
Merge has --info-rules which works with the same variant across different files. It would be nice to be able to apply this to same variant in the same file, for instance norm --rm-dup --info-rules=BCFTOOLS_OLD_VARIANT:join would have allowed a workaround for this issue
Mark duplicates
Another way to solve this would be to mark duplicates rather than remove them, for instance a DUPLICATE flag.
Then I could select them out into a separate file and:
Use existing merge to bring them back with --info-rules
Process this much smaller file in Python and process them however I want then merge back (much quicker than processing ~100G of compressed VCF in Python)
The text was updated successfully, but these errors were encountered:
My use case is lifting over gnomAD v4.1 from GRCh38 to T2T-CHM13v2.0 and sometimes multiple GRCh38 variants resolve to the same T2T coordinate - I want to be able to process these duplicates (say picking highest or lowest AF) rather than just taking the first in the file
Control how selecting duplicates works
It would be really useful to be able to choose which one to take. You could do this by defining how to sort the dupes then taking the 1st, for instance take the one with the highest AF, then highest AC with
--rm-dup-sort=-AF,-AC
or--rm-dup--sort=AF:desc,AC:desc
Merge functionality with duplicates
Merge has
--info-rules
which works with the same variant across different files. It would be nice to be able to apply this to same variant in the same file, for instancenorm --rm-dup --info-rules=BCFTOOLS_OLD_VARIANT:join
would have allowed a workaround for this issueMark duplicates
Another way to solve this would be to mark duplicates rather than remove them, for instance a DUPLICATE flag.
Then I could select them out into a separate file and:
--info-rules
The text was updated successfully, but these errors were encountered: