Skip to content

ci: add template validation workflow (Ajv + upstream schema)#1

Merged
mrkhachaturov merged 1 commit intomainfrom
ci/add-validation-workflow
Apr 15, 2026
Merged

ci: add template validation workflow (Ajv + upstream schema)#1
mrkhachaturov merged 1 commit intomainfrom
ci/add-validation-workflow

Conversation

@mrkhachaturov
Copy link
Copy Markdown
Owner

Summary

Adds a CI workflow that validates every bank template in this repo against the upstream HindSight schema on every PR and push. Before this, a contributor could merge a structurally broken template and nothing would catch it until HindClaw tried to import it at runtime.

Mirrors the upstream check-templates.mjs pattern from vectorize-io/hindsight#1065, adapted for this repo's flat layout.

Validation layers

  1. Schema validation (Ajv) — compiles the upstream-generated bank-template-schema.json and validates each templates/*.json against it. Authoritative: upstream regenerates the schema from BankTemplateManifest.model_json_schema() on every PR via verify-generated-files, so it always matches the live Pydantic model.
  2. Catalog integritytemplates.json has a templates array, every entry has id + manifest_file, IDs unique within catalog, every manifest_file ref points at an existing file, no orphan files under templates/.
  3. HindClaw content guardretain_extraction_mode ∈ {concise, verbose, custom, chunks}. Upstream's schema accepts any string; the runtime retain pipeline only honours those four values.

Files

File Purpose
.github/workflows/validate.yml Single job: setup-node + npm ci + curl the schema + node scripts/check-templates.mjs
scripts/check-templates.mjs Validator — adapted from upstream's script for the flat layout, adds orphan detection + content guards
package.json / package-lock.json ajv ^8.17.1 as the sole dev dep, "private": true
.gitignore /node_modules/ and /bank-template-schema.json (fetched fresh each CI run)

Verified locally

All three committed templates (backend-python, fullstack-typescript, astromech-test) pass. Sanity-checked the negative paths:

  • Schema error: planted disposition_empathy = "five" → caught with must be integer at /bank/disposition_empathy
  • Content guard: planted retain_extraction_mode = "verbatim" → caught with valid-values listing ✓
  • Orphan detection: dropped an unreferenced file in templates/ → caught with explicit error ✓

Test plan

  • Validator runs and reports all 3 templates valid
  • curl fetches bank-template-schema.json from upstream main successfully
  • npm ci resolves from the committed lockfile

Not included (deliberate)

  • No pinned-schema fallback: we fetch from upstream main fresh each run. If upstream is down, CI is down — acceptable for now given upstream's uptime. If this bites, add a committed schemas/bank-template-schema.json + a cron job that PRs updates on drift.
  • No Pydantic runtime validation: Ajv catches the same structural issues without requiring hindsight-api-slim install, and the runtime-only checks (mental-model ID regex, unique IDs, version validator) are already covered by upstream's schema where possible, and by the content guard for the one field that isn't.
  • No generator: unlike upstream, this repo doesn't regenerate the schema — we consume it. Drift detection happens upstream; we just trust and use.

🤖 Generated with Claude Code

Adds .github/workflows/validate.yml running on every PR and push to
main. Mirrors the upstream hindsight-docs/scripts/check-templates.mjs
pattern that PR #1065 wired into vectorize-io/hindsight's CI — fetch
bank-template-schema.json from upstream main, compile with Ajv, validate
every manifest referenced from templates.json.

Three validation layers:

1. Schema validation — Ajv compiles the upstream-generated JSON Schema
   (authoritative: regenerated from BankTemplateManifest.model_json_schema()
   on every upstream PR via verify-generated-files). Every
   templates/*.json is validated against it. Catches field type
   mismatches, unknown fields, missing required keys, invalid
   MentalModelTrigger / TagGroup shapes, and everything else the upstream
   Pydantic model enforces at parse time.

2. Catalog integrity — templates.json must have a templates array, every
   entry needs id + manifest_file, IDs unique within the catalog, every
   manifest_file ref points at an existing file, no orphan files under
   templates/ (drop-and-forget regression guard).

3. HindClaw content guard — retain_extraction_mode must be in
   {concise, verbose, custom, chunks}. Upstream's schema accepts any
   string here but the runtime retain pipeline only honours those four
   values; verbatim was dropped during the convergence refactor.

Files:

- .github/workflows/validate.yml — single job, ~30 lines, node 22 +
  npm ci + curl the schema + node scripts/check-templates.mjs
- scripts/check-templates.mjs — adapted from upstream's check-templates.mjs
  for this repo's flat layout (templates.json + templates/ at repo root
  instead of hindsight-docs/src/data/). Adds the orphan detection and
  content guard layers upstream doesn't need.
- package.json + package-lock.json — ajv ^8.17.1 as the sole dev dep,
  "private": true so the package is not publishable
- .gitignore — /node_modules/ and /bank-template-schema.json (fetched
  fresh at CI time, never committed)

Verified locally against all three committed templates (backend-python,
fullstack-typescript, astromech-test): all pass. Sanity-checked negative
paths by planting a schema violation (disposition_empathy = "five"),
the verbatim content guard, and an orphan file — all three are caught
with clear per-error output pointing at the failing path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mrkhachaturov mrkhachaturov merged commit 63ae353 into main Apr 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant