Integrate snakemake steps into overall pipeline #45

aappling-usgs · 2020-06-15T15:23:01Z

Currently, https://github.com/USGS-R/delaware-model-prep/blob/master/Snakefile downloads the geospatial fabric catchment attributes and their metadata, relates catchments to segments, subsets for UMN and aggregates for PSU.

Some issues to clean up:

The snakemake file refers to files in a 10_spatial_data folder that doesn't exist in this repo.
The file organization clashes with that of the rest of the repo.
The snakemake pipeline is not connected to the scipiper pipeline.
For sharing, a conda environment.yml would help.

I think all of these issues are possible to fix while still using snakemake, but depending on how hard it is to get snakemake going, I'm also open to switching the orchestration of these steps to scipiper. I think we should aim to keep the actual functions in python rather than porting them to R, both to avoid the recoding time and to keep these functions easily maintainable by Jeff.

jsadler2 · 2021-10-26T20:45:07Z

This came up again here: USGS-R/drb-do-ml#6 (comment)

jsadler2 · 2021-10-27T22:08:49Z

@aappling-usgs - when you say

The file organization clashes with that of the rest of the repo.

What do you mean? And is that something I should fix?

aappling-usgs · 2021-10-28T15:00:51Z

It's been a while since I wrote that, but what I see today is:

folder prefixes of 10 and 20 rather than 1-9ish -- right now it's not obvious from the prefix number whether this step generally comes within or after steps 1-9, though I acknowledge that steps 1-9 don't get executed in exactly, explicitly that order, either. Catchment attributes strike me as a 1_ or 1b_ sort of thing. So...not a big deal, but could be clearer.
Some of the files in 10_spatial_data are probably products, or could be, of other steps in the pipeline. For example, 10_spatial_data/out/Segments_subset.shp and 10_spatial_data/out/sntemp_subset_ids.csv - you might already be planning for this when you make the 10_spatial_data files available in this repo, but for those segment lists could we refer to existing files in the rest of the pipeline rather than pulling those from some less-traceable source such as Drive or S3?

jsadler2 · 2021-11-01T21:52:36Z

Okay. Thanks, @aappling-usgs. That's helpful.

What if I just moved this all into 1_network? What do you think of that idea @aappling-usgs, @limnoliver?
I will see how far I can get without anything in 10_spatial_data

aappling-usgs · 2021-11-05T12:55:42Z

Working it all into 1_network sounds good to me.

aappling-usgs self-assigned this Jun 15, 2020

jsadler2 self-assigned this Oct 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate snakemake steps into overall pipeline #45

Integrate snakemake steps into overall pipeline #45

aappling-usgs commented Jun 15, 2020 •

edited

Loading

jsadler2 commented Oct 26, 2021

jsadler2 commented Oct 27, 2021

aappling-usgs commented Oct 28, 2021

jsadler2 commented Nov 1, 2021

aappling-usgs commented Nov 5, 2021

Integrate snakemake steps into overall pipeline #45

Integrate snakemake steps into overall pipeline #45

Comments

aappling-usgs commented Jun 15, 2020 • edited Loading

jsadler2 commented Oct 26, 2021

jsadler2 commented Oct 27, 2021

aappling-usgs commented Oct 28, 2021

jsadler2 commented Nov 1, 2021

aappling-usgs commented Nov 5, 2021

aappling-usgs commented Jun 15, 2020 •

edited

Loading