Skip to content

[codex] Add agent loop eval smoke suite#988

Draft
mimeding wants to merge 8 commits into
osaurus-ai:mainfrom
mimeding:codex/evals-agentloop-smoke
Draft

[codex] Add agent loop eval smoke suite#988
mimeding wants to merge 8 commits into
osaurus-ai:mainfrom
mimeding:codex/evals-agentloop-smoke

Conversation

@mimeding
Copy link
Copy Markdown
Contributor

@mimeding mimeding commented Apr 30, 2026

Summary

Adds a model-free AgentLoop eval domain and smoke suite for todo, complete, and clarify helper contracts.

This PR is intentionally stacked on top of #986, which already includes #985 beneath it. #985 fixes the current origin/main FluidAudio/PocketTTS build break, and #986 carries the currently green development-plan and CI-readiness base. Once #985 and #986 merge, this PR should reduce to the eval smoke-suite commit.

Changes

  • Add agent_loop eval expectations and runner path for pure-data cases.
  • Add smoke fixtures for todo checklist parsing, complete summary validation, and clarify option parsing/deduping.
  • Skip preflight plugin/model setup for pure-data suites so they run without a live model/server.
  • Document AgentLoop suite usage in Packages/OsaurusEvals/README.md.

Validation

  • git diff --check
  • jq empty Packages/OsaurusEvals/Suites/AgentLoop/*.json
  • make evals EVALS_SUITE=Packages/OsaurusEvals/Suites/AgentLoop
  • swift build --package-path Packages/OsaurusCore

Dependency

Depends on #986, which depends on #985.

@mimeding mimeding force-pushed the codex/evals-agentloop-smoke branch from a5cb884 to 2af0f9c Compare April 30, 2026 18:29
@mimeding mimeding marked this pull request as ready for review May 1, 2026 02:52
@mimeding
Copy link
Copy Markdown
Contributor Author

mimeding commented May 1, 2026

Author-side cleanup complete; this PR is now ready for review.

Verification:

  • GitHub checks are green: test-core, test-cli, swiftlint, shellcheck, update_release_draft, and pr-clean-gate.
  • scripts/ci/check-pr-clean.sh osaurus-ai/osaurus 988 reports clean.
  • Local verification passed for both Packages/OsaurusCore and the standalone Packages/OsaurusEvals build.

The previous failing run was the stale DerivedData/EventSource dependency class, not the eval smoke-suite changes. Current head includes the same CI hardening and FluidAudio 0.14.3 lock alignment as the other green recovery branches.

@mimeding mimeding force-pushed the codex/evals-agentloop-smoke branch from 1d8948e to c34e8dc Compare May 1, 2026 16:15
@mimeding mimeding marked this pull request as draft May 10, 2026 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant