Skip to content

Comments

Fix: read attributes_total from the assessment*.json to set the max attribute expectations for schema validation#315

Closed
dbasunag wants to merge 1 commit intoambient-code:mainfrom
dbasunag:feature/leadership_board
Closed

Fix: read attributes_total from the assessment*.json to set the max attribute expectations for schema validation#315
dbasunag wants to merge 1 commit intoambient-code:mainfrom
dbasunag:feature/leadership_board

Conversation

@dbasunag
Copy link
Contributor

Description

Read attributes_total from the json file and use it to temporarily modify specs/001-agentready-scorer/contracts/assessment-schema.json for validating the report.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Related Issues

Fixes # #309
Relates to #

Changes Made

  • Uses attributes_total from the assessment*.json
  • sets the max attribute total value accordingly in specs/001-agentready-scorer/contracts/assessment-schema.json before validating the report.

Testing

  • Unit tests pass (pytest)
  • Integration tests pass
  • Manual testing performed
  • No new warnings or errors

Checklist

  • My code follows the project's code style
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

@github-actions
Copy link
Contributor

📈 Test Coverage Report

Branch Coverage
This PR 66.1%
Main 66.1%
Diff ✅ +0%

Coverage calculated from unit tests only

@github-actions
Copy link
Contributor

AgentReady Code Review

PR: Fix: read attributes_total from the assessment*.json to set the max attribute expectations for schema validation
Reviewer: Claude Code (AgentReady Development Agent)
Fixes: #309


Summary

This PR reads attributes_total from the submitted JSON, validates it meets a minimum of 15, then temporarily mutates assessment-schema.json to match the submitted count before running validation, and attempts to restore it afterward.

The intent is correct — the hardcoded const: 25 schema rejects valid --exclude submissions. However, the implementation has several significant issues that should block merge.


🔴 Critical Issues

1. Schema is not restored on validation failure

# Temporarily modify schema to match submitted attributes count
cp "$SCHEMA_FILE" "$SCHEMA_FILE.backup"
jq ... "$SCHEMA_FILE" > "$SCHEMA_FILE.tmp" && mv ...

# Run validation with updated schema
agentready validate-report "$ASSESSMENT_FILE"   # if this exits non-zero...

# Restore original schema
mv "$SCHEMA_FILE.backup" "$SCHEMA_FILE"         # this line NEVER runs

GitHub Actions bash steps exit on non-zero by default. If agentready validate-report fails (which is the entire point of this step), the restore is skipped and the modified schema persists in the workspace for all subsequent steps.

Fix: Use a trap or move schema restoration to a separate step with if: always():

trap 'mv "$SCHEMA_FILE.backup" "$SCHEMA_FILE" 2>/dev/null || true' EXIT
cp "$SCHEMA_FILE" "$SCHEMA_FILE.backup"
jq ... "$SCHEMA_FILE" > "$SCHEMA_FILE.tmp" && mv "$SCHEMA_FILE.tmp" "$SCHEMA_FILE"
agentready validate-report "$ASSESSMENT_FILE"

2. Race condition on concurrent CI runs

The schema file is part of the checked-out repo. If two leaderboard PRs are validated simultaneously on different runners that share a cached workspace, they will corrupt each other's schema modifications. Schema files should never be mutated as part of transient validation logic.


🟠 High Issues

3. Allows non-comparable leaderboard submissions

The leaderboard's value depends on all scores being computed against the same attribute set. Allowing submissions with 15–24 attributes means a 90/100 score from 15 attributes is displayed alongside a 90/100 score from 25 attributes with no indication they're different. This makes rankings misleading.

Issue #309's Option A (clear rejection at submission time) maintains integrity:

TOTAL=$(jq -r '.attributes_total' "$CHANGED_FILE")
if [ "$TOTAL" -ne 25 ]; then
  echo "::error::Leaderboard requires all 25 attributes. Found $TOTAL. Remove --exclude flags."
  exit 1
fi

Competing fix: PR #312 (DRAFT) addresses the same issue by relaxing the schema to minimum: 10, maximum: 25 and adding regression tests. The two PRs should not both be merged — maintainers should decide whether the leaderboard accepts partial submissions before merging either fix.

4. Arbitrary minimum threshold of 15

The magic number 15 has no documented rationale. The current attribute total is 25. If flexible submissions are the right approach, what justifies 15 as the floor? The threshold should derive from a principled rule (e.g., "at least all Tier 1 + Tier 2 attributes must be assessed") or be documented in the PR description.


🟡 Medium Issues

5. Potential bash arithmetic injection

ATTRIBUTES_TOTAL: ${{ steps.extract.outputs.attributes_total }}
...
if (( ATTRIBUTES_TOTAL < 15 )); then

${{ ... }} is substituted at workflow template evaluation time, before the shell runs. If a malicious submission sets "attributes_total" to a non-numeric string, jq -r outputs it literally, and the arithmetic context (( ... )) would evaluate it as a shell expression. Safer pattern:

if ! [[ "$ATTRIBUTES_TOTAL" =~ ^[0-9]+$ ]]; then
  echo "::error::attributes_total is not a valid integer: $ATTRIBUTES_TOTAL"
  exit 1
fi
if (( ATTRIBUTES_TOTAL < 15 )); then

6. No tests added

The PR checklist items for unit tests and integration tests are unchecked. Any change to validation logic should have corresponding test coverage. This is especially important given the schema-mutation approach being novel and potentially fragile.


🟢 Positive Observations

  • The jq --arg usage for schema patching is correctly parameterized (not string-interpolated).
  • Extracting attributes_total into $GITHUB_OUTPUT for downstream steps is a reasonable pattern.
  • The ✅ Attributes total: echo provides useful CI visibility.

Recommendation: Request Changes

The schema-mutation approach is fragile and introduces a race condition. The broken restore-on-failure is a correctness bug that will leave a corrupted schema in the workspace on any invalid submission — which is the exact common case this PR is meant to handle.

Suggested path forward:

  1. Maintainers decide: should the leaderboard accept --exclude submissions? (score comparability question)
  2. If no: implement Issue [Bug] JSON schema validation against the Assessment submission which has less than 25 attributes fails hard in the Leaderboard CI #309 Option A — reject with a clear error (5-line change, no schema mutation)
  3. If yes: PR fix(schema): allow assessments with excluded attributes #312's schema relaxation is simpler and safer than dynamic mutation — promote it from DRAFT, verify the minimum bound is justified, and close this PR

@github-actions
Copy link
Contributor

AgentReady Code Review — PR #315

Reviewer: Claude Code (review-agentready)
Files Changed: .github/workflows/leaderboard.yml (+30 lines)


Summary

This PR reads attributes_total from a user-submitted assessment JSON and uses it to temporarily mutate the validation schema before running agentready validate-report. The intent is to fix false-negative schema validation failures when repos have a different attribute count than the hardcoded const: 25 in the schema.


🔴 Critical Issues

1. Circular Validation — Schema Configured by Untrusted Input

The core problem: the submitted file's own attributes_total field is used to configure the validator that validates that same file. This is a trust inversion.

A malicious submitter can set attributes_total: 15, include only 15 findings (all passes), and the schema is updated to allow exactly 15 findings before validation runs. The minimum: 15 guard mitigates the most egregious abuse but doesn't fix the underlying problem — any value ≥ 15 is accepted without independent verification.

Correct fix: The valid range for attributes_total should be derived from the assessor registry (a trusted source within the repo), not from the untrusted submission.

2. Schema Not Restored on Validation Failure

agentready validate-report "$ASSESSMENT_FILE"

# Restore original schema
mv "$SCHEMA_FILE.backup" "$SCHEMA_FILE"

If agentready validate-report exits non-zero, the step fails and the mv restore never executes, leaving assessment-schema.json in a mutated state for the remainder of the job.

Fix: Use a trap to guarantee restoration regardless of exit code:

trap 'mv "$SCHEMA_FILE.backup" "$SCHEMA_FILE" 2>/dev/null || true' EXIT

3. No Upper Bound on attributes_total

if (( ATTRIBUTES_TOTAL < 15 )); then

There is no corresponding upper bound check. A submission with attributes_total: 9999 passes this guard and sets the schema's findings.maxItems to 9999.


🟠 Major Issues

4. jq -r Returns String "null" When Field Is Missing

If attributes_total is absent from the JSON, jq -r outputs the literal string "null". The downstream arithmetic expansion:

(( ATTRIBUTES_TOTAL < 15 ))

...will error: bash: ((: null: value too great for base — an unhelpful failure mode.

Fix: Add an explicit null/empty check before the arithmetic comparison:

if [ "$ATTRIBUTES_TOTAL" = "null" ] || [ -z "$ATTRIBUTES_TOTAL" ]; then
  echo "::error::attributes_total is missing from the submission"
  exit 1
fi

🟡 Minor Issues

5. Hardcoded Minimum of 15 Is Unexplained

The schema previously enforced const: 25. The PR introduces minimum: 15 with no comment or documentation. Add a comment explaining the rationale (minimum supported assessor count, historical floor, etc.).

6. Root Cause Not Fully Addressed

PR #312 already addresses issue #309 by relaxing the schema for exclusions. This PR tackles a related symptom — assessments generated with a non-25 attribute count. The correct long-term fix is to update the schema permanently to use minimum/maximum bounds rather than const: 25, removing the need for runtime mutation entirely.


AgentReady Attribute Compliance

Attribute Status Notes
Security — Input Validation ❌ Fail User-controlled value shapes the validator
Error Handling / Graceful Degradation ❌ Fail Schema not restored on failure
CI/CD Workflow Quality ⚠️ Partial Happy path works; brittle on error paths

Recommended Fix

Update assessment-schema.json directly — replace "const": 25 with a range, and update the findings bounds to match:

"attributes_total": {
  "type": "integer",
  "minimum": 15,
  "maximum": 50
},
"findings": {
  "type": "array",
  "minItems": 1,
  "maxItems": 50,
  ...
}

This makes the schema permanently flexible without runtime mutation from untrusted input and eliminates all issues above.


Overall verdict: ❌ Request changes — the approach introduces a trust inversion that undermines the validation's security guarantee. The schema should be made permanently flexible rather than dynamically shaped by the input being validated.

@github-actions
Copy link
Contributor

AgentReady Code Review

PR: #315 - Fix: read attributes_total from assessment*.json to set max attribute expectations for schema validation
Reviewer: Claude (AgentReady Specialist)
Date: 2026-02-23


Summary

This PR modifies .github/workflows/leaderboard.yml to dynamically adjust the JSON schema before running agentready validate-report, using the attributes_total field from the submitted assessment JSON. The intent is to support submissions with fewer than 25 attributes (e.g., repos using --exclude). The approach is functional but fragile, and it directly conflicts with the simpler approach taken in the currently open PR #312.


Score Impact Analysis

Attribute Current After Impact
ci_cd_pipeline Neutral (workflow change, not assessed attribute)
security_scanning Neutral
test_coverage Slight negative (no tests added for workflow change)

Note: This PR only modifies a GitHub Actions workflow; no source assessors are affected.


Security Review

Status: ⚠️ Minor concern

Expression Injection: The ATTRIBUTES_TOTAL value originates from a user-submitted JSON file and flows through ${{ steps.extract.outputs.attributes_total }} into an env: block. Using env: rather than inline ${{ }} expressions in the run: block is correct practice and prevents expression injection. No injection vulnerability.

No Upper Bound on ATTRIBUTES_TOTAL (medium): Only a minimum of 15 is validated. A submission with attributes_total: 9999 would pass the minimum check and set findings.minItems = 9999 — the submission would then fail validation (since no real assessment has 9999 findings), but this wastes CI resources and could produce confusing error messages. Add a maximum check:

if (( ATTRIBUTES_TOTAL > 50 )); then
  echo "::error::Invalid attributes_total: $ATTRIBUTES_TOTAL (maximum: 50)"
  exit 1
fi

Code Quality

Status: ⚠️ Needs attention

1. Conflicting with open PR #312 (blocking): Both this PR and PR #312 fix issue #309, but via different approaches:

PR #312 PR #315 (this PR)
Approach Relax schema statically (minimum: 10, maximum: 25) Dynamically rewrite schema from submission value
Complexity Low High
Fragility Low Medium
Includes tests Yes No

PR #312's approach (static range in schema) is simpler, safer, and already includes regression tests. This PR should be evaluated against #312 and one approach chosen. Consider closing this PR in favor of #312, or defining which use case each serves.

2. Backup/restore does not use trap (medium): GitHub Actions run: steps use set -eo pipefail by default. If agentready validate-report "$ASSESSMENT_FILE" exits non-zero, execution halts and the restore line (mv "$SCHEMA_FILE.backup" "$SCHEMA_FILE") never runs. The schema file is left in a modified state for the remainder of the job.

On ephemeral runners this doesn't affect persistence, but subsequent steps in the same job see a corrupted schema. A trap is the correct fix:

trap 'mv "$SCHEMA_FILE.backup" "$SCHEMA_FILE" 2>/dev/null || true' EXIT

3. In-place schema mutation pattern (low): Modifying a tracked source file in a CI job is a code smell. Prefer writing a temp file:

TEMP_SCHEMA=$(mktemp)
jq --arg total "$ATTRIBUTES_TOTAL" '...' "$SCHEMA_FILE" > "$TEMP_SCHEMA"
agentready validate-report "$ASSESSMENT_FILE" --schema "$TEMP_SCHEMA"
rm -f "$TEMP_SCHEMA"

This requires validate-report to accept a --schema flag, which may warrant a separate feature request.

4. Hard-coded minimum of 15 (low): The value 15 appears without comment. This should have an inline comment, e.g.:

# Minimum 15 attributes required (AgentReady has 25+ assessors; 15 is a reasonable floor for partial runs)
if (( ATTRIBUTES_TOTAL < 15 )); then

5. jq type safety (low): If attributes_total is missing from the JSON, jq -r '.attributes_total' outputs the string "null". In bash arithmetic, (( null < 15 )) treats null as a variable name (resolves to 0), so 0 < 15 is true and the minimum check correctly triggers an error. This works, but the error message reads "Invalid attributes_total: null" which could confuse submitters. A clearer pre-check would help:

if [ "$ATTRIBUTES_TOTAL" = "null" ] || [ -z "$ATTRIBUTES_TOTAL" ]; then
  echo "::error::attributes_total field is missing from assessment JSON"
  exit 1
fi

Test Coverage

Status: ⚠️ Insufficient

The PR checklist shows unit and integration tests were not run or added. No new tests are included for the minimum threshold logic or the dynamic schema modification. PR #312 includes regression tests for the same feature — consider whether those tests cover this approach, or add dedicated tests.


Documentation

Status: ✅ No action required

The change is confined to a GitHub Actions workflow. No user-facing documentation update is needed.


Required Changes

  1. Resolve conflict with PR fix(schema): allow assessments with excluded attributes #312 — both fix issue [Bug] JSON schema validation against the Assessment submission which has less than 25 attributes fails hard in the Leaderboard CI #309 via different approaches. One should be closed or merged first. If this approach is preferred, close fix(schema): allow assessments with excluded attributes #312; if fix(schema): allow assessments with excluded attributes #312 is preferred, close this PR.
  2. Add upper bound for ATTRIBUTES_TOTAL — prevent arbitrarily large values from reaching jq and setting absurd schema constraints.

Suggested Improvements

  1. Add trap 'mv "$SCHEMA_FILE.backup" "$SCHEMA_FILE" 2>/dev/null || true' EXIT before the schema modification for reliable cleanup.
  2. Add a comment explaining the minimum threshold of 15 and its rationale.
  3. Add a null/missing check for ATTRIBUTES_TOTAL before arithmetic to produce a clearer error message.
  4. Consider adding regression tests similar to those in PR fix(schema): allow assessments with excluded attributes #312.
  5. Run actionlint on the workflow before merging (per CLAUDE.md requirements — actionlint was not available in the current environment to verify).

AgentReady Compliance Notes

  • ci_cd_pipeline: This change improves leaderboard submission robustness, but the mutation-and-restore pattern reduces pipeline reliability. A static schema fix (PR fix(schema): allow assessments with excluded attributes #312's approach) would be a cleaner improvement.
  • test_coverage: The PR adds workflow logic without any corresponding tests; this is a regression against the project's >80% coverage target for new code.
  • security_practices: Expression injection is correctly avoided via env: variables. The ATTRIBUTES_TOTAL value from user-submitted JSON is handled safely in arithmetic and jq --arg contexts.

Review generated by AgentReady Specialist | AgentReady

@github-actions
Copy link
Contributor

AgentReady Code Review — PR #315

Fix: read attributes_total from assessment.json for schema validation*
Fixes #309 | .github/workflows/leaderboard.yml


Summary

This takes a CI-level approach to the same issue fixed by PR #312 — dynamically patching the schema to match the submitted attributes_total. The core intent is good (enforcing findings.count == attributes_total), but the implementation has critical reliability and concurrency issues.


Issues

Critical

No cleanup trap — schema file left modified on failure

If agentready validate-report exits non-zero, the mv "$SCHEMA_FILE.backup" "$SCHEMA_FILE" restore step never runs. The schema file is permanently modified in the repo checkout. Subsequent steps in the same or other jobs would use the corrupted schema.

Fix:

trap "mv '$SCHEMA_FILE.backup' '$SCHEMA_FILE' 2>/dev/null || true" EXIT
cp "$SCHEMA_FILE" "$SCHEMA_FILE.backup"

Race condition — parallel PR validations corrupt the schema

If two leaderboard submissions trigger simultaneously, both jobs modify the same $SCHEMA_FILE at the same time. The last writer wins, and validation results become non-deterministic. This is a correctness bug for the leaderboard.

Fix: avoid modifying the checked-in file entirely. Pass the expected count as a flag to agentready validate-report, or use a temp directory copy:

TEMP_SCHEMA=$(mktemp)
jq --arg total "$ATTRIBUTES_TOTAL" '...' "$SCHEMA_FILE" > "$TEMP_SCHEMA"
agentready validate-report --schema "$TEMP_SCHEMA" "$ASSESSMENT_FILE"
rm "$TEMP_SCHEMA"

Medium

Minimum threshold inconsistency with PR #312

This PR uses minimum: 15; PR #312 uses minimum: 10. These must be reconciled before either is merged.

jq silently fails if schema structure has changed

The jq filter assumes .properties.attributes_total.const exists. After PR #312 is merged, const no longer exists and the filter silently adds it back, creating a schema that is inconsistent with the source. The filter should use conditional logic or target a known stable path.

findings.minItems == findings.maxItems == attributes_total is overly strict

Findings that are "skipped" status are still included in the findings array. Setting minItems == maxItems == total means if any finding is omitted from the output, the validation fails. The original minItems: 25, maxItems: 25 had the same strictness — but it is worth confirming this reflects actual CLI output behavior.

Minor

PR description has malformed issue reference

Fixes # https://github.com/ambient-code/agentready/issues/309 — the # is redundant when using a full URL. Use either Fixes #309 or the full URL alone.

Testing checklist is entirely unchecked

No unit tests, integration tests, or manual testing are confirmed. Changes to CI validation logic should have at least manual testing evidence.


What's Good

  • Correctly extracts attributes_total and propagates it through GitHub Actions outputs
  • Intent to enforce findings.count == attributes_total is the right consistency check missing from PR fix(schema): allow assessments with excluded attributes #312
  • Validates minimum threshold before attempting schema patching

Attribute Impact

Attribute Impact
ci_cd_pipeline Negative — introduces reliability risk in the validation pipeline
contract_driven_development Neutral — addresses the issue but in a fragile way

Verdict: Do not merge in current form. The race condition and missing cleanup trap are blocking issues. Recommend instead extending agentready validate-report to accept --expected-total N flag and validating the count there, avoiding schema file mutation entirely. Coordinate with PR #312 on which approach to use.

@dbasunag
Copy link
Contributor Author

closing in favor of #312

@dbasunag dbasunag closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant