Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Use snakemake for pub-ready figure generation #630

Open
jaclyn-taroni opened this issue Mar 14, 2020 · 6 comments
Open

Use snakemake for pub-ready figure generation #630

jaclyn-taroni opened this issue Mar 14, 2020 · 6 comments

Comments

@jaclyn-taroni
Copy link
Member

Originally suggested by @jashapiro.

With #613, we're using a large bash script to regenerate figures. Some kind of workflow management system would be better. snakemake is already on the project Docker container. The CCDL does not have bandwidth to implement this as part of our initial effort to get publication-ready figures together (#571) but wanted to document this potential improvement.

@jashapiro
Copy link
Member

Since I did do a bit of proof of concept work on this, I should link it here for anybody who might work on this in the future.

https://github.com/jashapiro/OpenPBTA-analysis/blob/jashapiro/snakemake-results/Snakefile

One thing to note in that implementation is that it uses scripts as inputs in an attempt to capture both changes in data files AND analysis/figure generation code. But it might not catch all changes, esepcially in scripts called by the defined scripts.

@sjspielman
Copy link
Member

@jaclyn-taroni are we still interested in doing this? If so, it would be good to do in conjunction with/instead of #1261

@jaclyn-taroni
Copy link
Member Author

Are we interested? Sure, but I think we'd need to estimate the amount of effort before I comment on whether or not we should do it.

@sjspielman
Copy link
Member

My gut tells me the effort would be a little too high, but this depends on how much is in what @jashapiro had previously written up. This link is now long gone it seems and I can't find the branch here. @jashapiro, still have this locally?

@jashapiro
Copy link
Member

In some previous cleaning, I must have removed that branch on github, but I did have it locally, and it is now back up. Note that that file was setting up to do all the analysis: not just the figures, but also the analysis that generated the inputs to those figures. Doing just the figures should be a bit easier/

Whether it is worth it really depends on how much effort it is to go through all the figure scripts and figure out what all the inputs and outputs are, since (at least in the past) the scripts declare their input files internally, not with arguments. We probably don't want to be in a situation where an input file changes and does not result in the figure being correctly regenerated. (We can get around that with snakemake -F to force rerunning everything, but that is not ideal.)

@sjspielman
Copy link
Member

Thanks for branch + context, @jashapiro!!

This point is key -

the scripts declare their input files internally, not with arguments

The figure generation script calls scripts in both scenarios - inputs internally, and inputs as arguments. We'd want this more consistent for a robust workflow, and it doesn't make sense to me to modify analysis module files specifically to work with a manuscript figure-generating workflow. This is especially true because many of the scripts called don't actually generate figures, but prepare data to generate figures from.

My sense now is snakemake is not the move at this point, and we should stick with the existing (and soon-to-be reorganized! #1261 ) bash script.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants