Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fragment size handling for MACS2 peak calling #193

Closed
ppericard opened this issue Nov 2, 2020 · 9 comments
Closed

Improve fragment size handling for MACS2 peak calling #193

ppericard opened this issue Nov 2, 2020 · 9 comments
Labels
question Further information is requested

Comments

@ppericard
Copy link

ppericard commented Nov 2, 2020

Right now, the fragment size can be a main input parameter of the nf-core/chipseq pipeline. However, if this parameter is supplied, MACS2 callpeaks is still set-up to automatically guess the fragment size. If the IP signal is not strong enough, or the coverage is not very high, the detection algorithm can easily fail to detect or guess correctly the fragment size which leads to the peak-calling step crashing.

The solution I'm suggesting is: if the user inputs the fragment size then it could be passed to MACS2 callpeaks like this:

--nomodel --extsize $params.fragment_size

Or at least, the user should have the option to deactivate automatic detection in MACS2 callpeaks and pass the input fragment size

@drpatelh drpatelh added the question Further information is requested label Nov 2, 2020
@drpatelh
Copy link
Member

drpatelh commented Nov 2, 2020

Hi @ppericard, I intentionally avoided passing the fragment size to the peak calling because one of the main advantages of MACS2 is that it is able to calculate the shifting model itself. The fragment size is actually used for other processes such as creating the bigWig tracks and for some QC plots with deepTools where this information will not be known.

A better solution would be to let the user decide which parameters they would like to provide MACS2 in these instances, and that will be possible in the next release anyway since it will be written in DSL2.

@ppericard
Copy link
Author

Hi @drpatelh. I understand the logic. I'm just surprised no-one was stopped by this pb before. I have IP tracks with very clear signal but macs2 cannot identify correctly the shifting model. I have no idea why.
I ended up forking the repo and modifying the command line for macs2 callpeaks so I can perform the peak-calling "manually" by passing the fragment size.
I'll be watching closely for the next release since going to DSL2 will allow much more flexibility in terms of using all the tools parameters.

@drpatelh
Copy link
Member

drpatelh commented Nov 2, 2020

Yeah, sorry. Its a fine balance between providing a generic set of parameters that can be used to tune the behaviour of the pipeline, maintaining best practice and trying to factor in everyone's use cases. Defo should be easier in the next release.

@johannesnicolaus
Copy link

Hi @drpatelh, is it possible to reopen this issue?

I have also had problems with smaller genome that was workable by specifying --nomodel and --extsize

macs3-project/MACS#353

Now that the release is in DSL2, maybe it can be implemented? Here is what worked for me, but I haven't worked on the details, and I'm new to this so I don't know if this is optimal:
master...johannesnicolaus:nf-core-chipseq:develop

@drpatelh
Copy link
Member

Hi @johannesnicolaus ! Pinging @JoseEspinosa here because he has been responsible for bringing the pipeline to DSL2. Might be easier to have a single parameter that sets non-mandatory options for MACS2 e.g. --extra_macs2_callpeak_args . Could have similar options for other commonly changed tools. Just need to sanity check that providing some of these options in particular combinations still works.

Have you tried overriding these options via a custom config file? This is one of the benefits of using the DSL2 implementation.

@drpatelh drpatelh reopened this Apr 20, 2023
@johannesnicolaus
Copy link

Thanks for the reply! Do you mean to set these options via a custom config file? I haven't tried that yet because changing the modules.config and nextflow.config was more straightforward to me. Maybe I can try that as well.

@JoseEspinosa
Copy link
Member

Yes, I think more people are modifying macs2 parameters using a custom config. Actually, this slack exchange might be interesting for you: https://nfcore.slack.com/archives/CFP55NZ5G/p1652280017471059?thread_ts=1652184721.335959&cid=CFP55NZ5G

About implementing the --extra_macs2_callpeak_args parameter, at a first glance, I would be more in favor of using a custom config since the validation of the extra parameters could be a little bit complicated and also would need to keep them always updated with the last pipeline installed macs2 version 🤔

@johannesnicolaus
Copy link

Thanks! Seems good for now

@JoseEspinosa
Copy link
Member

Can be tuned from 2.0.0 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants