Skip to content
This repository has been archived by the owner on Jun 1, 2023. It is now read-only.

Integrate snakemake steps into overall pipeline #45

Open
4 tasks
aappling-usgs opened this issue Jun 15, 2020 · 5 comments
Open
4 tasks

Integrate snakemake steps into overall pipeline #45

aappling-usgs opened this issue Jun 15, 2020 · 5 comments
Assignees

Comments

@aappling-usgs
Copy link
Member

aappling-usgs commented Jun 15, 2020

Currently, https://github.com/USGS-R/delaware-model-prep/blob/master/Snakefile downloads the geospatial fabric catchment attributes and their metadata, relates catchments to segments, subsets for UMN and aggregates for PSU.

Some issues to clean up:

  • The snakemake file refers to files in a 10_spatial_data folder that doesn't exist in this repo.
  • The file organization clashes with that of the rest of the repo.
  • The snakemake pipeline is not connected to the scipiper pipeline.
  • For sharing, a conda environment.yml would help.

I think all of these issues are possible to fix while still using snakemake, but depending on how hard it is to get snakemake going, I'm also open to switching the orchestration of these steps to scipiper. I think we should aim to keep the actual functions in python rather than porting them to R, both to avoid the recoding time and to keep these functions easily maintainable by Jeff.

@aappling-usgs aappling-usgs self-assigned this Jun 15, 2020
@jsadler2 jsadler2 self-assigned this Oct 26, 2021
@jsadler2
Copy link
Collaborator

This came up again here: USGS-R/drb-do-ml#6 (comment)

@jsadler2
Copy link
Collaborator

@aappling-usgs - when you say

The file organization clashes with that of the rest of the repo.

What do you mean? And is that something I should fix?

@aappling-usgs
Copy link
Member Author

It's been a while since I wrote that, but what I see today is:

  • folder prefixes of 10 and 20 rather than 1-9ish -- right now it's not obvious from the prefix number whether this step generally comes within or after steps 1-9, though I acknowledge that steps 1-9 don't get executed in exactly, explicitly that order, either. Catchment attributes strike me as a 1_ or 1b_ sort of thing. So...not a big deal, but could be clearer.
  • Some of the files in 10_spatial_data are probably products, or could be, of other steps in the pipeline. For example, 10_spatial_data/out/Segments_subset.shp and 10_spatial_data/out/sntemp_subset_ids.csv - you might already be planning for this when you make the 10_spatial_data files available in this repo, but for those segment lists could we refer to existing files in the rest of the pipeline rather than pulling those from some less-traceable source such as Drive or S3?

@jsadler2
Copy link
Collaborator

jsadler2 commented Nov 1, 2021

Okay. Thanks, @aappling-usgs. That's helpful.

  • What if I just moved this all into 1_network? What do you think of that idea @aappling-usgs, @limnoliver?
  • I will see how far I can get without anything in 10_spatial_data

@aappling-usgs
Copy link
Member Author

Working it all into 1_network sounds good to me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants