Add `params` schema support to ProcessNode and align legacy `_from_scriptconfig` #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

Erotemic wants to merge 4 commits into main from codex/add-params-class-variable-to-processnode

.gitlab-ci.yml

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -117,7 +117,7 @@ stages:
  
      - echo "INSTALL_EXTRAS=$INSTALL_EXTRAS"

      - echo "UV_RESOLUTION=$UV_RESOLUTION"

      - echo "MOD_VERSION=$MOD_VERSION"

      - python -m uv pip install "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist

      - python -m pip install --prefer-binary "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist

      - echo "Install finished."

      - echo "Creating test sandbox directory"

      - export WORKSPACE_DNAME="sandbox"

    @@ -187,7 +187,7 @@ stages:
  
      - echo "INSTALL_EXTRAS=$INSTALL_EXTRAS"

      - echo "UV_RESOLUTION=$UV_RESOLUTION"

      - echo "MOD_VERSION=$MOD_VERSION"

      - python -m uv pip install "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist

      - python -m pip install --prefer-binary "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist

      - echo "Install finished."

      - echo "Creating test sandbox directory"

      - export WORKSPACE_DNAME="sandbox"

    @@ -257,7 +257,7 @@ stages:
  
      - echo "INSTALL_EXTRAS=$INSTALL_EXTRAS"

      - echo "UV_RESOLUTION=$UV_RESOLUTION"

      - echo "MOD_VERSION=$MOD_VERSION"

      - python -m uv pip install "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist

      - python -m pip install --prefer-binary "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist

      - echo "Install finished."

      - echo "Creating test sandbox directory"

      - export WORKSPACE_DNAME="sandbox"

    @@ -327,7 +327,7 @@ stages:
  
      - echo "INSTALL_EXTRAS=$INSTALL_EXTRAS"

      - echo "UV_RESOLUTION=$UV_RESOLUTION"

      - echo "MOD_VERSION=$MOD_VERSION"

      - python -m uv pip install "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist

      - python -m pip install --prefer-binary "kwdagger[$INSTALL_EXTRAS]==$MOD_VERSION" -f dist

      - echo "Install finished."

      - echo "Creating test sandbox directory"

      - export WORKSPACE_DNAME="sandbox"

    @@ -353,10 +353,10 @@ stages:
  
      - echo "pytest command finished, moving the coverage file to the repo root"

    build/sdist:

      <<: *build_sdist_template

      image: python:3.14.0

      image: python:3.14

    test/sdist/minimal-loose/cp314-linux-x86_64:

      <<: *test_minimal-loose_template

      image: python:3.14.0

      image: python:3.14

      needs:

      - build/sdist

    build/cp310-linux-x86_64:

    @@ -453,30 +453,30 @@ test/full-strict/cp313-linux-x86_64:
  
      - build/cp313-linux-x86_64

    build/cp314-linux-x86_64:

      <<: *build_wheel_template

      image: python:3.14.0

      image: python:3.14

    test/minimal-loose/cp314-linux-x86_64:

      <<: *test_minimal-loose_template

      image: python:3.14.0

      image: python:3.14

      needs:

      - build/cp314-linux-x86_64

    test/full-loose/cp314-linux-x86_64:

      <<: *test_full-loose_template

      image: python:3.14.0

      image: python:3.14

      needs:

      - build/cp314-linux-x86_64

    test/minimal-strict/cp314-linux-x86_64:

      <<: *test_minimal-strict_template

      image: python:3.14.0

      image: python:3.14

      needs:

      - build/cp314-linux-x86_64

    test/full-strict/cp314-linux-x86_64:

      <<: *test_full-strict_template

      image: python:3.14.0

      image: python:3.14

      needs:

      - build/cp314-linux-x86_64

    lint:

      <<: *common_template

      image: python:3.14.0

      image: python:3.14

      stage: lint

      before_script:

      - df -h

    @@ -487,7 +487,7 @@ lint:
  
      allow_failure: true

    gpgsign/wheels:

      <<: *common_template

      image: python:3.14.0

      image: python:3.14

      stage: gpgsign

      artifacts:

        paths:

    @@ -551,7 +551,7 @@ gpgsign/wheels:
  
        artifacts: true

    deploy/wheels:

      <<: *common_template

      image: python:3.14.0

      image: python:3.14

      stage: deploy

      only:

        refs:

    @@ -673,4 +673,4 @@ deploy/wheels:
  
            "$CI_API_V4_URL/projects/$CI_PROJECT_ID/releases"

    # end

    # end

.readthedocs.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -7,15 +7,30 @@ @@
     # Required
     version: 2
     build:
       os: "ubuntu-24.04"
       tools:
         python: "3.13"
+    # Build documentation in the docs/ directory with Sphinx
     sphinx:
       configuration: docs/source/conf.py
+    # Build documentation with MkDocs
+    #mkdocs:
+    #  configuration: mkdocs.yml
+    # Optionally build your docs in additional formats such as PDF and ePub
     formats: all
     python:
       install:
       - requirements: requirements/docs.txt
       - method: pip
         path: .
+          #extra_requirements:
+          #  - docs
+    #conda:
+    #  environment: environment.yml

CHANGELOG.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -2,8 +2,15 @@ @@
     We [keep a changelog](https://keepachangelog.com/en/1.0.0/).
     We aim to adhere to [semantic versioning](https://semver.org/spec/v2.0.0.html).
+    ## Version 0.2.2 - Unreleased
+    ### Added
+    * Support deriving ProcessNode IO/parameter groups from a scriptconfig schema via the new ``params`` class variable.
     ## Version 0.2.1 - Unreleased
     ### Changed
     * YAML paths in grid values no longer auto-expand unless explicitly behind an `__include__` key. See docs for details.
@@ Expand Down @@

docs/source/conf.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -138,7 +138,7 @@ def visit_Assign(self, node): @@
         return visitor.version
     project = 'kwdagger'
-    copyright = '2025, Jon Crall'
+    copyright = '2026, Jon Crall'
     author = 'Jon Crall'
     modname = 'kwdagger'
@@ Expand Down @@

docs/source/manual/tutorials/scriptconfig_pipeline/README.rst

-Original file line number
+Diff line change
@@ -0,0 +1,119 @@
+    Scriptconfig Pipeline Tutorial
+    ==============================
+    This tutorial mirrors the two-stage pipeline example, but it uses
+    ``scriptconfig`` schemas to declare input/output paths and parameter groups.
+    The ``ProcessNode.params`` class variable automatically derives
+    ``in_paths``, ``out_paths``, ``algo_params``, and ``perf_params`` from the
+    schema so your pipeline stays in sync with the CLI definitions.
+    Files in this tutorial
+    ----------------------
+    * ``data/`` - two small JSONL datasets of movie and food reviews.
+    * ``example_user_module/cli`` - command line entry points for the prediction and
+      evaluation nodes (scriptconfig schemas live here).
+    * ``example_user_module/pipelines.py`` - pipeline wiring that uses
+      ``ProcessNode.params`` to derive node IO/params.
+    * ``run_pipeline.sh`` - copy/paste helper to schedule and aggregate.
+    How scriptconfig drives ProcessNode definitions
+    -----------------------------------------------
+    Each CLI class declares the node schema with tags:
+    * ``in_path`` / ``in``: input paths
+    * ``out_path`` / ``out``: output templates (non-empty defaults are used)
+    * ``algo_param`` / ``algo``: algorithm parameters that affect outputs
+    * ``perf_param`` / ``perf``: execution-only parameters
+    The ``primary`` tag on an ``out_path`` marks which output signals completion.
+    ``ProcessNode`` uses these tags to populate the appropriate groups unless you
+    explicitly override them on the node class.
+    Here is the schema for the prediction node:
+    .. code:: python
+        class KeywordSentimentPredictCLI(scfg.DataConfig):
+            src_fpath = scfg.Value(None, tags=['in_path'])
+            dst_fpath = scfg.Value('keyword_predictions.json', tags=['out_path', 'primary'])
+            dst_dpath = scfg.Value('.', tags=['out_path'])
+            keyword = scfg.Value('great', tags=['algo_param'])
+            case_sensitive = scfg.Value(False, tags=['algo_param'])
+            workers = scfg.Value(0, tags=['perf_param'])
+    The pipeline nodes simply point ``params`` at these schemas:
+    .. code:: python
+        class KeywordSentimentPredict(kwdagger.ProcessNode):
+            name = 'keyword_sentiment_predict'
+            executable = f'python {EXAMPLE_DPATH}/cli/keyword_sentiment_predict.py'
+            params = KeywordSentimentPredictCLI
+        class SentimentEvaluate(kwdagger.ProcessNode):
+            name = 'sentiment_evaluate'
+            executable = f'python {EXAMPLE_DPATH}/cli/sentiment_evaluate.py'
+            params = SentimentEvaluateCLI
+    Connecting the pipeline
+    -----------------------
+    The wiring is the same as the base tutorial: prediction outputs feed evaluation
+    inputs, and the labeled dataset feeds both nodes.
+    .. code:: python
+        nodes = {
+            'keyword_sentiment_predict': KeywordSentimentPredict(),
+            'sentiment_evaluate': SentimentEvaluate(),
+        }
+        nodes['keyword_sentiment_predict'].outputs['dst_fpath'].connect(
+            nodes['sentiment_evaluate'].inputs['pred_fpath']
+        )
+        nodes['keyword_sentiment_predict'].inputs['src_fpath'].connect(
+            nodes['sentiment_evaluate'].inputs['true_fpath']
+        )
+    Running the tutorial
+    --------------------
+    .. code:: bash
+        # From this folder (modify to where your copy is)
+        cd ~/code/kwdagger/docs/source/manual/tutorials/scriptconfig_pipeline/
+        # Set the PYTHONPATH so kwdagger can see the custom module in this directory
+        export PYTHONPATH=.
+        # Define where you want the results to be written to
+        EVAL_DPATH=$PWD/results
+        kwdagger schedule \
+            --params="
+                pipeline: 'example_user_module.pipelines.my_sentiment_pipeline()'
+                matrix:
+                    keyword_sentiment_predict.src_fpath:
+                        - data/toy_reviews_movies.jsonl
+                        - data/toy_reviews_food.jsonl
+                    keyword_sentiment_predict.keyword:
+                        - great
+                        - boring
+                        - love
+                    sentiment_evaluate.workers: 0
+            " \
+            --root_dpath="${EVAL_DPATH}" \
+            --tmux_workers=2 --backend=serial --skip_existing=1 \
+            --run=1
+    Once jobs complete, aggregate with:
+    .. code:: bash
+        kwdagger aggregate \
+            --params="
+                pipeline: 'example_user_module.pipelines.my_sentiment_pipeline()'
+                root_dpath: ${EVAL_DPATH}
+            "

docs/source/manual/tutorials/scriptconfig_pipeline/data/toy_reviews_food.jsonl

-Original file line number
+Diff line change
@@ -0,0 +1,12 @@
+    {"text": "The pizza was great and the service was friendly.", "label": "positive"}
+    {"text": "Soup was bland and arrived cold.", "label": "negative"}
+    {"text": "Great flavors but the wait was terrible.", "label": "negative"}
+    {"text": "I love the dessert menu!", "label": "positive"}
+    {"text": "Portions were small and the seating was cramped.", "label": "negative"}
+    {"text": "Great coffee and great ambiance.", "label": "positive"}
+    {"text": "Boring menu without vegetarian options.", "label": "negative"}
+    {"text": "I love the spicy noodles.", "label": "positive"}
+    {"text": "Service was boring but the food was great.", "label": "positive"}
+    {"text": "The salad was soggy and tasteless.", "label": "negative"}
+    {"text": "Great staff who love their customers.", "label": "positive"}
+    {"text": "Boring decor but I love the fresh bread.", "label": "positive"}

docs/source/manual/tutorials/scriptconfig_pipeline/data/toy_reviews_movies.jsonl

-Original file line number
+Diff line change
@@ -0,0 +1,17 @@
+    {"text": "Great soundtrack and great pacing.", "label": "positive"}
+    {"text": "Boring plot with terrible acting.", "label": "negative"}
+    {"text": "Great visuals but boring story.", "label": "negative"}
+    {"text": "I love this cast and the great humor.", "label": "positive"}
+    {"text": "The movie was boring and far too long.", "label": "negative"}
+    {"text": "A great ending and a great start.", "label": "positive"}
+    {"text": "Lovely cinematography and I love the score.", "label": "positive"}
+    {"text": "The jokes were boring and fell flat.", "label": "negative"}
+    {"text": "Great characters kept me engaged.", "label": "positive"}
+    {"text": "Love the worldbuilding even if the pacing was slow.", "label": "positive"}
+    {"text": "Action scenes were boring and predictable.", "label": "negative"}
+    {"text": "Great sequel with heart.", "label": "positive"}
+    {"text": "I love how the mystery unfolded.", "label": "positive"}
+    {"text": "Boring dialogue ruined the tension.", "label": "negative"}
+    {"text": "Great acting but boring editing.", "label": "negative"}
+    {"text": "The film was great fun for the whole family.", "label": "positive"}
+    {"text": "I love every minute of this movie.", "label": "positive"}

docs/source/manual/tutorials/scriptconfig_pipeline/example_user_module/__init__.py

Empty file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `params` schema support to ProcessNode and align legacy `_from_scriptconfig` #7

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Add params schema support to ProcessNode and align legacy _from_scriptconfig #7

Are you sure you want to change the base?

Uh oh!

Add params schema support to ProcessNode and align legacy _from_scriptconfig #7

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Add `params` schema support to ProcessNode and align legacy `_from_scriptconfig` #7

Add `params` schema support to ProcessNode and align legacy `_from_scriptconfig` #7