feat(skill): optimize code exec prompt and add e2e benchmark#1056
Open
fang-tech wants to merge 6 commits intoagentscope-ai:mainfrom
Open
feat(skill): optimize code exec prompt and add e2e benchmark#1056fang-tech wants to merge 6 commits intoagentscope-ai:mainfrom
fang-tech wants to merge 6 commits intoagentscope-ai:mainfrom
Conversation
…truction option - AgentSkillPromptProvider: add DEFAULT_CODE_EXECUTION_INSTRUCTION template and conditional append logic when code execution is enabled with an upload dir - AgentSkillPromptProvider: expose setCodeExecutionEnable/setUploadDir/setCodeExecutionInstruction - SkillBox.CodeExecutionBuilder: add codeExecutionInstruction() builder method - SkillBox: wire prompt provider on enable() and after uploadSkillFiles() - SkillBox: remove deprecated no-arg constructor Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ecall dimensions Skills under e2e-skills/ are designed specifically for benchmarking: - data-transform vs data-report: semantic discrimination (similar domain, different intent) - image-resize: distractor resistance (must not fire for non-image tasks) - log-parser: fuzzy trigger (indirect problem description should still match) - git-changelog: code execution recall (LLM must reference deployed script path) SkillE2ETest collects pass/fail per scenario without asserting individually; @afterall asserts overall recall rate >= 75% and prints a summary table. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR enhances the Skill system prompt to better trigger actual code execution (via execute_shell_command with absolute paths), adds an E2E benchmark suite for skill recall/code-exec recall, removes deprecated skill-loading utilities, and updates EN/ZH docs accordingly.
Changes:
- Append an optional “Code Execution” section to the skill system prompt when code execution is enabled, with a new
codeExecutionInstruction(...)customization hook. - Add
SkillE2ETestbenchmark suite + benchmark skills undersrc/test/resources/e2e-skills/. - Remove deprecated
SkillBox()no-arg constructor and deprecatedJarSkillRepositoryAdapter; update documentation.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/zh/task/agent-skill.md | Updates ZH documentation for code execution builder options and custom skill prompts. |
| docs/en/task/agent-skill.md | Updates EN documentation for code execution builder options and custom skill prompts. |
| docs/_toc.yml | Adjusts docs navigation entry for the ZH “agent-skill” page. |
| agentscope-core/src/test/resources/e2e-skills/log-parser/scripts/extract_errors.py | Adds benchmark script for log parsing/code execution recall scenarios. |
| agentscope-core/src/test/resources/e2e-skills/log-parser/SKILL.md | Adds benchmark skill metadata/doc for log-parser. |
| agentscope-core/src/test/resources/e2e-skills/image-resize/scripts/resize.py | Adds benchmark script for distractor resistance scenarios. |
| agentscope-core/src/test/resources/e2e-skills/image-resize/SKILL.md | Adds benchmark skill metadata/doc for image-resize. |
| agentscope-core/src/test/resources/e2e-skills/git-changelog/scripts/generate_changelog.py | Adds benchmark script for code execution recall scenarios. |
| agentscope-core/src/test/resources/e2e-skills/git-changelog/SKILL.md | Adds benchmark skill metadata/doc for git-changelog. |
| agentscope-core/src/test/resources/e2e-skills/data-transform/scripts/json_to_csv.py | Adds benchmark script for semantic discrimination scenarios. |
| agentscope-core/src/test/resources/e2e-skills/data-transform/scripts/csv_to_json.py | Adds benchmark script for semantic discrimination scenarios. |
| agentscope-core/src/test/resources/e2e-skills/data-transform/SKILL.md | Adds benchmark skill metadata/doc for data-transform. |
| agentscope-core/src/test/resources/e2e-skills/data-report/scripts/summarize.py | Adds benchmark script for semantic discrimination scenarios. |
| agentscope-core/src/test/resources/e2e-skills/data-report/SKILL.md | Adds benchmark skill metadata/doc for data-report. |
| agentscope-core/src/test/java/io/agentscope/core/skill/SkillBoxTest.java | Adds integration tests ensuring prompt injection behavior when code execution is enabled. |
| agentscope-core/src/test/java/io/agentscope/core/skill/AgentSkillPromptProviderTest.java | Adds unit tests for code execution prompt section behavior and customization. |
| agentscope-core/src/test/java/io/agentscope/core/e2e/SkillE2ETest.java | Adds concurrent-gated E2E benchmark tests for skill recall + code execution recall. |
| agentscope-core/src/main/java/io/agentscope/core/skill/util/JarSkillRepositoryAdapter.java | Removes deprecated adapter (superseded by ClasspathSkillRepository). |
| agentscope-core/src/main/java/io/agentscope/core/skill/SkillBox.java | Removes deprecated no-arg constructor; wires code-exec prompt settings from builder. |
| agentscope-core/src/main/java/io/agentscope/core/skill/AgentSkillPromptProvider.java | Implements conditional code execution section injection + customization plumbing. |
agentscope-core/src/main/java/io/agentscope/core/skill/AgentSkillPromptProvider.java
Show resolved
Hide resolved
agentscope-core/src/test/java/io/agentscope/core/e2e/SkillE2ETest.java
Outdated
Show resolved
Hide resolved
agentscope-core/src/test/java/io/agentscope/core/e2e/SkillE2ETest.java
Outdated
Show resolved
Hide resolved
agentscope-core/src/main/java/io/agentscope/core/skill/SkillBox.java
Outdated
Show resolved
Hide resolved
agentscope-core/src/main/java/io/agentscope/core/skill/AgentSkillPromptProvider.java
Outdated
Show resolved
Hide resolved
agentscope-core/src/main/java/io/agentscope/core/skill/AgentSkillPromptProvider.java
Outdated
Show resolved
Hide resolved
agentscope-core/src/main/java/io/agentscope/core/skill/AgentSkillPromptProvider.java
Outdated
Show resolved
Hide resolved
…chmark scripts - Fix AgentSkillPromptProviderTest: update assertion from "existing scripts" to "existing Python script" to match actual prompt template text - Add Apache 2.0 license headers to all 6 e2e benchmark Python scripts Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…chmark scripts - Fix AgentSkillPromptProviderTest: update assertion from "existing scripts" to "existing Python script" to match actual prompt template text - Add Apache 2.0 license headers to all 6 e2e benchmark Python scripts Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AgentScope-Java Version
1.0.11-SNAPSHOT
Description
fix #1050, fix #1039, fix #952
Code execution prompt injection:
AgentSkillPromptProvidernow appends a## Code Executionsection to the skill system prompt when code execution is enabled,instructing the LLM to use
execute_shell_commandwith absolute script paths rather thandescribing or rewriting commands inline. The prompt is wired automatically via
SkillBoxon
enable()and afteruploadSkillFiles().codeExecutionInstruction()builder option:SkillBox.CodeExecutionBuilderexposes anew
.codeExecutionInstruction(template)method for customizing the injected prompt section.%splaceholders are substituted with theuploadDirabsolute path (up to 5 times).Remove deprecated
SkillBox()no-arg constructor andJarSkillRepositoryAdapterclass(superseded by
ClasspathSkillRepository).E2E benchmark suite (
SkillE2ETest): 5 parameterized tests across 4 recall dimensionsusing purpose-built benchmark skills. Tests record pass/fail without individual assertions;
@AfterAllasserts overall recall rate ≥ 75% and prints a summary table. Benchmark skills:data-transform/data-report— semantic discriminationimage-resize— distractor resistancelog-parser— fuzzy triggergit-changelog— code execution recallDocs (EN + ZH): add configuration reference for code execution builder options and a new
"Custom Skill Prompts" section documenting
instruction,template, andcodeExecutionInstruction.Checklist
Please check the following items before code is ready to be reviewed.
mvn spotless:applymvn test)