Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 15 additions & 15 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ stages:
- echo "INSTALL_EXTRAS=$INSTALL_EXTRAS"
- echo "UV_RESOLUTION=$UV_RESOLUTION"
- echo "MOD_VERSION=$MOD_VERSION"
- python -m uv pip install "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist
- python -m pip install --prefer-binary "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist
- echo "Install finished."
- echo "Creating test sandbox directory"
- export WORKSPACE_DNAME="sandbox"
Expand Down Expand Up @@ -187,7 +187,7 @@ stages:
- echo "INSTALL_EXTRAS=$INSTALL_EXTRAS"
- echo "UV_RESOLUTION=$UV_RESOLUTION"
- echo "MOD_VERSION=$MOD_VERSION"
- python -m uv pip install "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist
- python -m pip install --prefer-binary "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist
- echo "Install finished."
- echo "Creating test sandbox directory"
- export WORKSPACE_DNAME="sandbox"
Expand Down Expand Up @@ -257,7 +257,7 @@ stages:
- echo "INSTALL_EXTRAS=$INSTALL_EXTRAS"
- echo "UV_RESOLUTION=$UV_RESOLUTION"
- echo "MOD_VERSION=$MOD_VERSION"
- python -m uv pip install "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist
- python -m pip install --prefer-binary "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist
- echo "Install finished."
- echo "Creating test sandbox directory"
- export WORKSPACE_DNAME="sandbox"
Expand Down Expand Up @@ -327,7 +327,7 @@ stages:
- echo "INSTALL_EXTRAS=$INSTALL_EXTRAS"
- echo "UV_RESOLUTION=$UV_RESOLUTION"
- echo "MOD_VERSION=$MOD_VERSION"
- python -m uv pip install "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist
- python -m pip install --prefer-binary "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist
- echo "Install finished."
- echo "Creating test sandbox directory"
- export WORKSPACE_DNAME="sandbox"
Expand All @@ -353,10 +353,10 @@ stages:
- echo "pytest command finished, moving the coverage file to the repo root"
build/sdist:
<<: *build_sdist_template
image: python:3.14.0
image: python:3.14
test/sdist/minimal-loose/cp314-linux-x86_64:
<<: *test_minimal-loose_template
image: python:3.14.0
image: python:3.14
needs:
- build/sdist
build/cp310-linux-x86_64:
Expand Down Expand Up @@ -453,30 +453,30 @@ test/full-strict/cp313-linux-x86_64:
- build/cp313-linux-x86_64
build/cp314-linux-x86_64:
<<: *build_wheel_template
image: python:3.14.0
image: python:3.14
test/minimal-loose/cp314-linux-x86_64:
<<: *test_minimal-loose_template
image: python:3.14.0
image: python:3.14
needs:
- build/cp314-linux-x86_64
test/full-loose/cp314-linux-x86_64:
<<: *test_full-loose_template
image: python:3.14.0
image: python:3.14
needs:
- build/cp314-linux-x86_64
test/minimal-strict/cp314-linux-x86_64:
<<: *test_minimal-strict_template
image: python:3.14.0
image: python:3.14
needs:
- build/cp314-linux-x86_64
test/full-strict/cp314-linux-x86_64:
<<: *test_full-strict_template
image: python:3.14.0
image: python:3.14
needs:
- build/cp314-linux-x86_64
lint:
<<: *common_template
image: python:3.14.0
image: python:3.14
stage: lint
before_script:
- df -h
Expand All @@ -487,7 +487,7 @@ lint:
allow_failure: true
gpgsign/wheels:
<<: *common_template
image: python:3.14.0
image: python:3.14
stage: gpgsign
artifacts:
paths:
Expand Down Expand Up @@ -551,7 +551,7 @@ gpgsign/wheels:
artifacts: true
deploy/wheels:
<<: *common_template
image: python:3.14.0
image: python:3.14
stage: deploy
only:
refs:
Expand Down Expand Up @@ -673,4 +673,4 @@ deploy/wheels:
"$CI_API_V4_URL/projects/$CI_PROJECT_ID/releases"


# end
# end
15 changes: 15 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,30 @@

# Required
version: 2

build:
os: "ubuntu-24.04"
tools:
python: "3.13"

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/source/conf.py

# Build documentation with MkDocs
#mkdocs:
# configuration: mkdocs.yml

# Optionally build your docs in additional formats such as PDF and ePub
formats: all

python:
install:
- requirements: requirements/docs.txt
- method: pip
path: .
#extra_requirements:
# - docs

#conda:
# environment: environment.yml
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,15 @@
We [keep a changelog](https://keepachangelog.com/en/1.0.0/).
We aim to adhere to [semantic versioning](https://semver.org/spec/v2.0.0.html).

## Version 0.2.2 - Unreleased

### Added

* Support deriving ProcessNode IO/parameter groups from a scriptconfig schema via the new ``params`` class variable.

## Version 0.2.1 - Unreleased


### Changed

* YAML paths in grid values no longer auto-expand unless explicitly behind an `__include__` key. See docs for details.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ def visit_Assign(self, node):
return visitor.version

project = 'kwdagger'
copyright = '2025, Jon Crall'
copyright = '2026, Jon Crall'
author = 'Jon Crall'
modname = 'kwdagger'

Expand Down
119 changes: 119 additions & 0 deletions docs/source/manual/tutorials/scriptconfig_pipeline/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
Scriptconfig Pipeline Tutorial
==============================

This tutorial mirrors the two-stage pipeline example, but it uses
``scriptconfig`` schemas to declare input/output paths and parameter groups.
The ``ProcessNode.params`` class variable automatically derives
``in_paths``, ``out_paths``, ``algo_params``, and ``perf_params`` from the
schema so your pipeline stays in sync with the CLI definitions.

Files in this tutorial
----------------------

* ``data/`` - two small JSONL datasets of movie and food reviews.
* ``example_user_module/cli`` - command line entry points for the prediction and
evaluation nodes (scriptconfig schemas live here).
* ``example_user_module/pipelines.py`` - pipeline wiring that uses
``ProcessNode.params`` to derive node IO/params.
* ``run_pipeline.sh`` - copy/paste helper to schedule and aggregate.

How scriptconfig drives ProcessNode definitions
-----------------------------------------------

Each CLI class declares the node schema with tags:

* ``in_path`` / ``in``: input paths
* ``out_path`` / ``out``: output templates (non-empty defaults are used)
* ``algo_param`` / ``algo``: algorithm parameters that affect outputs
* ``perf_param`` / ``perf``: execution-only parameters

The ``primary`` tag on an ``out_path`` marks which output signals completion.
``ProcessNode`` uses these tags to populate the appropriate groups unless you
explicitly override them on the node class.

Here is the schema for the prediction node:

.. code:: python

class KeywordSentimentPredictCLI(scfg.DataConfig):
src_fpath = scfg.Value(None, tags=['in_path'])
dst_fpath = scfg.Value('keyword_predictions.json', tags=['out_path', 'primary'])
dst_dpath = scfg.Value('.', tags=['out_path'])

keyword = scfg.Value('great', tags=['algo_param'])
case_sensitive = scfg.Value(False, tags=['algo_param'])
workers = scfg.Value(0, tags=['perf_param'])

The pipeline nodes simply point ``params`` at these schemas:

.. code:: python

class KeywordSentimentPredict(kwdagger.ProcessNode):
name = 'keyword_sentiment_predict'
executable = f'python {EXAMPLE_DPATH}/cli/keyword_sentiment_predict.py'
params = KeywordSentimentPredictCLI

class SentimentEvaluate(kwdagger.ProcessNode):
name = 'sentiment_evaluate'
executable = f'python {EXAMPLE_DPATH}/cli/sentiment_evaluate.py'
params = SentimentEvaluateCLI

Connecting the pipeline
-----------------------

The wiring is the same as the base tutorial: prediction outputs feed evaluation
inputs, and the labeled dataset feeds both nodes.

.. code:: python

nodes = {
'keyword_sentiment_predict': KeywordSentimentPredict(),
'sentiment_evaluate': SentimentEvaluate(),
}
nodes['keyword_sentiment_predict'].outputs['dst_fpath'].connect(
nodes['sentiment_evaluate'].inputs['pred_fpath']
)
nodes['keyword_sentiment_predict'].inputs['src_fpath'].connect(
nodes['sentiment_evaluate'].inputs['true_fpath']
)

Running the tutorial
--------------------

.. code:: bash

# From this folder (modify to where your copy is)
cd ~/code/kwdagger/docs/source/manual/tutorials/scriptconfig_pipeline/

# Set the PYTHONPATH so kwdagger can see the custom module in this directory
export PYTHONPATH=.

# Define where you want the results to be written to
EVAL_DPATH=$PWD/results

kwdagger schedule \
--params="
pipeline: 'example_user_module.pipelines.my_sentiment_pipeline()'
matrix:
keyword_sentiment_predict.src_fpath:
- data/toy_reviews_movies.jsonl
- data/toy_reviews_food.jsonl
keyword_sentiment_predict.keyword:
- great
- boring
- love
sentiment_evaluate.workers: 0
" \
--root_dpath="${EVAL_DPATH}" \
--tmux_workers=2 --backend=serial --skip_existing=1 \
--run=1

Once jobs complete, aggregate with:

.. code:: bash

kwdagger aggregate \
--params="
pipeline: 'example_user_module.pipelines.my_sentiment_pipeline()'
root_dpath: ${EVAL_DPATH}
"
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{"text": "The pizza was great and the service was friendly.", "label": "positive"}
{"text": "Soup was bland and arrived cold.", "label": "negative"}
{"text": "Great flavors but the wait was terrible.", "label": "negative"}
{"text": "I love the dessert menu!", "label": "positive"}
{"text": "Portions were small and the seating was cramped.", "label": "negative"}
{"text": "Great coffee and great ambiance.", "label": "positive"}
{"text": "Boring menu without vegetarian options.", "label": "negative"}
{"text": "I love the spicy noodles.", "label": "positive"}
{"text": "Service was boring but the food was great.", "label": "positive"}
{"text": "The salad was soggy and tasteless.", "label": "negative"}
{"text": "Great staff who love their customers.", "label": "positive"}
{"text": "Boring decor but I love the fresh bread.", "label": "positive"}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{"text": "Great soundtrack and great pacing.", "label": "positive"}
{"text": "Boring plot with terrible acting.", "label": "negative"}
{"text": "Great visuals but boring story.", "label": "negative"}
{"text": "I love this cast and the great humor.", "label": "positive"}
{"text": "The movie was boring and far too long.", "label": "negative"}
{"text": "A great ending and a great start.", "label": "positive"}
{"text": "Lovely cinematography and I love the score.", "label": "positive"}
{"text": "The jokes were boring and fell flat.", "label": "negative"}
{"text": "Great characters kept me engaged.", "label": "positive"}
{"text": "Love the worldbuilding even if the pacing was slow.", "label": "positive"}
{"text": "Action scenes were boring and predictable.", "label": "negative"}
{"text": "Great sequel with heart.", "label": "positive"}
{"text": "I love how the mystery unfolded.", "label": "positive"}
{"text": "Boring dialogue ruined the tension.", "label": "negative"}
{"text": "Great acting but boring editing.", "label": "negative"}
{"text": "The film was great fun for the whole family.", "label": "positive"}
{"text": "I love every minute of this movie.", "label": "positive"}
Loading