I ran ./extract_duplicates to estimate PCR duplication rate on our data:
./extract_duplicates --bam SRR1585519.dedup.merged.sorted.bam --VCF NA19099.het.vcf --mmq 20 > SRR1585519.dedup.hetreads
however it returned:
PCR duplicates marked 0 total-reads 0 frac nan discarded 62152388
#clusters
total reads (PE=1) 0 unique-reads 0 duplicates:0, duplication rate nan
Using samtool rmdup and IGV, we observed a lot of duplicate reads in our data, but the script here seemed to not work on our files. What's the problem here?