feat: Add agent/evaluate/agent-framework as sub-skill to microsoft foundry skill#914
feat: Add agent/evaluate/agent-framework as sub-skill to microsoft foundry skill#914kuojianlu wants to merge 3 commits intomicrosoft:mainfrom
Conversation
…undry Add agent/evaluate/agent-framework as sub-skill to microsoft foundry skill, which covers: - Clarify metrics, dataset - Configure judge model - Generate evaluation codes - Run-fix loop to resolve code issue - Generate evaluation.md
tests/microsoft-foundry/foundry-agent/evaluate/agent-framework/integration.test.ts
Fixed
Show fixed
Hide fixed
tests/microsoft-foundry/foundry-agent/evaluate/agent-framework/integration.test.ts
Fixed
Show fixed
Hide fixed
| const agentMetadata = await run({ | ||
| prompt: "Evaluate my Foundry agent built with Microsoft Agent Framework using pytest evaluators.", | ||
| shouldEarlyTerminate: terminateOnCreate, | ||
| }); |
Check failure
Code scanning / CodeQL
Invocation of non-function Error test
| const agentMetadata = await run({ | ||
| prompt: "Add a custom evaluator to assess my agent's task completion using pytest-agent-evals.", | ||
| shouldEarlyTerminate: terminateOnCreate, | ||
| }); |
Check failure
Code scanning / CodeQL
Invocation of non-function Error test
There was a problem hiding this comment.
Pull request overview
This draft PR adds a new evaluate/agent-framework sub-skill to the microsoft-foundry skill for evaluating AI agents built with Microsoft Agent Framework using the pytest-agent-evals plugin. The skill provides comprehensive guidance for setting up agent evaluation workflows, including metric selection, dataset creation, judge model configuration, and VS Code Test Explorer integration.
Changes:
- Adds new agent-framework evaluation sub-skill with comprehensive workflow documentation
- Includes three reference files documenting code examples, built-in evaluators, and custom evaluators
- Adds comprehensive test coverage (unit, trigger, and integration tests)
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
plugin/skills/microsoft-foundry/foundry-agent/evaluate/agent-framework/SKILL.md |
Main skill file with evaluation workflow, judge model configuration, and error handling |
plugin/skills/microsoft-foundry/foundry-agent/evaluate/agent-framework/references/code-example.md |
Complete pytest-agent-evals code example with key concepts and generation guidelines |
plugin/skills/microsoft-foundry/foundry-agent/evaluate/agent-framework/references/built-in-evaluators.md |
Catalog of 15+ built-in evaluators for agents, general purpose, RAG, and similarity metrics |
plugin/skills/microsoft-foundry/foundry-agent/evaluate/agent-framework/references/custom-evaluators.md |
Patterns for custom prompt-based (LLM judge) and code-based evaluators |
plugin/skills/microsoft-foundry/SKILL.md |
Adds evaluate sub-skill entry to parent skill's sub-skill table and workflow mapping |
tests/microsoft-foundry/foundry-agent/evaluate/agent-framework/unit.test.ts |
Unit tests validating skill metadata, content sections, and reference files |
tests/microsoft-foundry/foundry-agent/evaluate/agent-framework/triggers.test.ts |
Trigger tests with positive/negative prompts and snapshot validation |
tests/microsoft-foundry/foundry-agent/evaluate/agent-framework/integration.test.ts |
Integration tests measuring skill invocation rates for evaluation prompts |
tests/microsoft-foundry/foundry-agent/evaluate/agent-framework/__snapshots__/triggers.test.ts.snap |
Jest snapshot capturing expected keywords and description format |
| description: | | ||
| Evaluate AI agents and workflows built with Microsoft Agent Framework using pytest-agent-evals plugin. Supports built-in and custom evaluators with VS Code Test Explorer integration. | ||
| USE FOR: evaluate agent, test agent, assess agent, agent evaluation, pytest evaluation, measure agent performance, agent quality, add evaluator, evaluation dataset, judge model. | ||
| DO NOT USE FOR: creating agents (use agent/create), deploying agents (use agent/deploy), evaluating non-Agent-Framework agents. |
There was a problem hiding this comment.
The frontmatter description uses a multiline literal block format (description: |) which is incompatible with skills.sh tooling. According to repository conventions (.github/skills/sensei/references/SCORING.md:140), descriptions MUST use inline double-quoted strings instead. Convert this to: description: "Evaluate AI agents and workflows built with Microsoft Agent Framework using pytest-agent-evals plugin. Supports built-in and custom evaluators with VS Code Test Explorer integration. USE FOR: evaluate agent, test agent, assess agent, agent evaluation, pytest evaluation, measure agent performance, agent quality, add evaluator, evaluation dataset, judge model. DO NOT USE FOR: creating agents (use agent/create), deploying agents (use agent/deploy), evaluating non-Agent-Framework agents."
Add agent/evaluate/agent-framework as sub-skill to microsoft foundry skill, which covers:
Important: This needs to be updated and merged after #824