Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: FRiP score calculation #317

Closed
tamuanand opened this issue Jul 23, 2023 · 1 comment
Closed

[Question]: FRiP score calculation #317

tamuanand opened this issue Jul 23, 2023 · 1 comment

Comments

@tamuanand
Copy link

tamuanand commented Jul 23, 2023

Hi

I posted this on slack - https://nfcore.slack.com/archives/CE5EL6326/p1690082285144239

For sake of completeness and to also account for the fact that some people might not be in slack, I am posting here.

I think that the FRiP score calculation is incorrect - though I might be wrong. That's why I am posting here and requesting extra pairs of eyes

FRiP score calculation

READS_IN_PEAKS=\$(intersectBed -a $bam -b $peak $args | awk -F '\t' '{sum += \$NF} END {print sum}')

I think that the -a and -b are switched in the nf-core code - see here for explanation: https://www.biostars.org/p/337872/#338646

  • if so, then the bed file needs to be sorted and hence this below might help

I also happened to find this blog post - https://yiweiniu.github.io/blog/2019/03/Calculate-FRiP-score/ and if I use either of these 2 methods below, the number obtained below for READS_IN_PEAKS is quite different from what is obtained above

READS_IN_PEAKS=\$(bedtools sort -i $peak | bedtools merge -i stdin | bedtools intersect -u -a
$bam -b stdin -ubam | samtools view -c)

READS_IN_PEAKS=\$(bedtools sort -i $peak | bedtools merge -i stdin | bedtools intersect -c -a stdin -b
$bam | awk '{ sum+=$4 } END { print sum }'

# both give the same answer but quite different from the value above with nf-core code 
# author claims that the 1st command here (using samtools view -c at the end) is slightly faster

Thanks in advance.

@JoseEspinosa
Copy link
Member

JoseEspinosa commented Jul 25, 2023

I copy the answer in slack in case someone comes across this issue:
The nf-core pipeline uses in its command the -f 0.20 argument ( “-f Minimum overlap required as a fraction of A”). This way our setting is more stringent and we recover fewer reads in peaks as you pointed out in your comment. You can try to test the commands with this argument and will recover the same value (note that for the second command you provided, you will need to set -F 0.20 instead)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants