-
Notifications
You must be signed in to change notification settings - Fork 348
feat(evaluator): parity sync, Claude SDK adapter, kiro per-stage gates, shared simulator #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
harmjeff
wants to merge
37
commits into
awslabs:main
Choose a base branch
from
harmjeff:fix/evaluator-update
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
0d04c82
feat(evaluator): parity sync + Claude SDK adapter with interactive si…
harmjeff 3bbc71a
fix(evaluator): support podman as docker fallback, skip sandbox check…
harmjeff 2875de9
fix(evaluator): container runtime detection + deterministic run folde…
harmjeff 872c088
fix(evaluator): add aidlc-cli-harness to root workspace dependencies
harmjeff 1213909
fix(evaluator): runner Mode 1, sdk adapter bugs, orchestrator script …
harmjeff b25fecf
fix(sdk-adapter): add post-run test evaluation (stage 2)
harmjeff e0e08e9
feat(evaluator): inject OpenAPI contract into all human-analog agents
harmjeff 35fc0be
feat(evaluator): shared HumanSimulator + kiro-cli human analog review
harmjeff ebbbc1d
fix(evaluator): wire simulator model from config into CLI adapter runs
harmjeff b13a0b6
feat(evaluator): Strands swarm uses shared HumanSimulator for all modes
harmjeff ed75bbc
fix(evaluator): separate executor and simulator token tracking for fu…
harmjeff 34a85ca
fix(sdk-adapter): import _get_container_cli from shared.sandbox
harmjeff 6501afc
fix(kiro-adapter): add post-run test evaluation (stage 2)
harmjeff 45f6b4e
fix(sandbox): pin base image to Python 3.13 instead of 3.14
harmjeff 6fe814b
fix(kiro-adapter): interactive stdin mode for genuine simulator gate …
harmjeff 703414b
fix(kiro-adapter): replace select.select with thread-based reader
harmjeff 025108a
fix(kiro-adapter): check construction docs after every chunk, not jus…
harmjeff 56677d6
fix(kiro-adapter): kill process on completion instead of /quit + hand…
harmjeff 89f5ea4
feat(kiro-adapter): print kiro agent turns to stderr
harmjeff 9bada2f
fix(kiro-adapter): fix output filter — accumulate chars into lines, t…
harmjeff c7a7d09
fix(kiro-adapter): two-phase --resume approach for genuine simulator …
harmjeff f43e6a2
feat(kiro-adapter): per-stage simulator gates (4 review points)
harmjeff 54a6ba9
refactor(cli-harness): plugin adapter registration + shared HumanSimu…
harmjeff 5b2eb4d
docs(evaluator): add CLI adapter developer guide to ARCHITECTURE.md
harmjeff 8e5de73
fix(security): inline nosemgrep suppressions for subprocess audit fin…
harmjeff a8c12bf
fix(security): add gitleaks allowlist for credential scrubber test fi…
harmjeff a603804
fix(docs): fix MD060 table alignment in ARCHITECTURE.md + gitleaks al…
harmjeff e00b731
fix(security): use fully-qualified semgrep rule IDs for suppressions
harmjeff b1cab54
fix(security): move nosemgrep to same line as subprocess call
harmjeff 3e20c0a
docs: add CLAUDE.md with scan commands and semgrep suppression guidance
harmjeff 2c41fb4
Revert "docs: add CLAUDE.md with scan commands and semgrep suppressio…
harmjeff c0922ea
fix(security): fix nosemgrep suppression format for subprocess calls
harmjeff 14d96cd
fix(security): test inline nosemgrep comment style in run_git_compare.py
harmjeff 1c8722a
Merge branch 'main' into fix/evaluator-update
harmjeff 631ebf3
fix(security): update nosemgrep suppression in run_git_compare.py
harmjeff 2c24b67
fix(security): test full semgrep rule ID suppression in run_git_compa…
harmjeff 28ea9b6
fix(security): update subprocess suppression in run_git_compare.py
harmjeff File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| # Gitleaks configuration for aidlc-evaluator | ||
| # Suppress false positives from test fixtures that intentionally contain fake credentials. | ||
|
|
||
| [allowlist] | ||
| description = "Fake credentials used in test_credential_scrubber.py test fixtures" | ||
| paths = [ | ||
| "packages/shared/tests/test_credential_scrubber.py", | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was there something that caused
3.14->3.13?