| Read the Docs | http://kwdagger.readthedocs.io/en/latest/ |
| Gitlab (main) | https://gitlab.kitware.com/computer-vision/kwdagger |
| Github (mirror) | https://github.com/Kitware/kwdagger |
| Pypi | https://pypi.org/project/kwdagger |
KWDagger is a lightweight framework for defining bash-centric DAGs and running large parameter sweeps. It builds on top of cmd_queue and scriptconfig to provide:
- Reusable
kwdagger.pipeline.Pipelineandkwdagger.pipeline.ProcessNodeabstractions for wiring inputs / outputs together. - A scheduling CLI (
kwdagger.schedule) that materializes pipeline definitions over a parameter grid and executes them via Slurm, tmux, or a serial backend. - An aggregation CLI (
kwdagger.aggregate) that loads job outputs, computes metrics, and optionally plots parameter/metric relationships. - A self-contained demo pipeline in
kwdagger.demo.demodatathat is used in CI and serves as a reference implementation.
kwdagger/pipeline.py– core pipeline and process node definitions, networkx graph construction, and configuration utilities.kwdagger/schedule.py–ScheduleEvaluationConfigCLI for expanding parameter grids into runnable jobs and dispatching them through cmd_queue backends.kwdagger/aggregate.py–AggregateEvluationConfigCLI for loading job outputs, computing parameter hash IDs, and generating text/plot reports.kwdagger/demo/demodata.py– end-to-end demo pipeline with prediction and evaluation stages plus CLI entry points for each node.docs/– Sphinx sources, including an example user module underdocs/source/manual/tutorials/twostage_pipeline.tests/– unit and functional coverage for pipeline wiring, scheduler behavior, aggregation, and import sanity checks.
Run the demo pipeline locally to see the CLI workflow end-to-end:
TMP_DPATH=$(mktemp -d --suffix "-kwdagger-demo")
cd "$TMP_DPATH"
echo "demo" > input.txt
EVAL_DPATH=$PWD/pipeline_output
python -m kwdagger.schedule \
--params="
pipeline: 'kwdagger.demo.demodata.my_demo_pipeline()'
matrix:
stage1_predict.src_fpath:
- input.txt
stage1_predict.param1:
- 123
stage1_evaluate.workers: 2
" \
--root_dpath="${EVAL_DPATH}" \
--backend=serial --skip_existing=1 --run=1
python -m kwdagger.aggregate \
--pipeline='kwdagger.demo.demodata.my_demo_pipeline()' \
--target "
- $EVAL_DPATH
" \
--output_dpath="$EVAL_DPATH/full_aggregate" \
--eval_nodes="
- stage1_evaluate
" \
--stdout_report="
top_k: 10
concise: 1
"The scheduler will generate per-node job directories with invoke.sh and
job_config.json metadata. The aggregator then consolidates results,
computes parameter hash IDs, and prints a concise report.
A novel graph based symlink structure allows for navigation of dependencies
within a node. The .succ folder holds symlinks to successors (i.e. results
that depend on the current results), and .pred holds symlinks to folders of
results that the current folder depends on.
For more in-depth information see tutorials:
python -m kwdagger.scheduleorkwdagger schedule– build and run a pipeline over a parameter matrix (seekwdagger.schedule.ScheduleEvaluationConfig).python -m kwdagger.aggregateorkwdagger aggregate– load completed runs and generate tabular and plotted summaries (kwdagger.aggregate.AggregateEvluationConfig).python -m kwdagger– modal CLI that exposes thescheduleandaggregatecommands viakwdagger.__main__.KWDaggerModal.