feat(skill): optimize code exec prompt and add e2e benchmark by fang-tech · Pull Request #1056 · agentscope-ai/agentscope-java

fang-tech · 2026-03-28T11:52:44Z

AgentScope-Java Version

1.0.11-SNAPSHOT

Description

fix #1050, fix #1039, fix #952

Code execution prompt injection: AgentSkillPromptProvider now appends a
## Code Execution section to the skill system prompt when code execution is enabled,
instructing the LLM to use execute_shell_command with absolute script paths rather than
describing or rewriting commands inline. The prompt is wired automatically via SkillBox
on enable() and after uploadSkillFiles().
- codeExecutionInstruction() builder option: SkillBox.CodeExecutionBuilder exposes a
  new .codeExecutionInstruction(template) method for customizing the injected prompt section.
  %s placeholders are substituted with the uploadDir absolute path (up to 5 times).
- Remove deprecated SkillBox() no-arg constructor and JarSkillRepositoryAdapter class
  (superseded by ClasspathSkillRepository).
- E2E benchmark suite (SkillE2ETest): 5 parameterized tests across 4 recall dimensions
  using purpose-built benchmark skills. Tests record pass/fail without individual assertions;
  @AfterAll asserts overall recall rate ≥ 75% and prints a summary table. Benchmark skills:
  - data-transform / data-report — semantic discrimination
  - image-resize — distractor resistance
  - log-parser — fuzzy trigger
  - git-changelog — code execution recall
- Docs (EN + ZH): add configuration reference for code execution builder options and a new
  "Custom Skill Prompts" section documenting instruction, template, and
  codeExecutionInstruction.

Checklist

Please check the following items before code is ready to be reviewed.

Code has been formatted with mvn spotless:apply
All tests are passing (mvn test)
Javadoc comments are complete and follow project conventions
Related documentation has been updated (e.g. links, examples, etc.)
Code is ready for review

…truction option - AgentSkillPromptProvider: add DEFAULT_CODE_EXECUTION_INSTRUCTION template and conditional append logic when code execution is enabled with an upload dir - AgentSkillPromptProvider: expose setCodeExecutionEnable/setUploadDir/setCodeExecutionInstruction - SkillBox.CodeExecutionBuilder: add codeExecutionInstruction() builder method - SkillBox: wire prompt provider on enable() and after uploadSkillFiles() - SkillBox: remove deprecated no-arg constructor Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

@afterall

…ecall dimensions Skills under e2e-skills/ are designed specifically for benchmarking: - data-transform vs data-report: semantic discrimination (similar domain, different intent) - image-resize: distractor resistance (must not fire for non-image tasks) - log-parser: fuzzy trigger (indirect problem description should still match) - git-changelog: code execution recall (LLM must reference deployed script path) SkillE2ETest collects pass/fail per scenario without asserting individually; @afterall asserts overall recall rate >= 75% and prints a summary table. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR enhances the Skill system prompt to better trigger actual code execution (via execute_shell_command with absolute paths), adds an E2E benchmark suite for skill recall/code-exec recall, removes deprecated skill-loading utilities, and updates EN/ZH docs accordingly.

Changes:

Append an optional “Code Execution” section to the skill system prompt when code execution is enabled, with a new codeExecutionInstruction(...) customization hook.
Add SkillE2ETest benchmark suite + benchmark skills under src/test/resources/e2e-skills/.
Remove deprecated SkillBox() no-arg constructor and deprecated JarSkillRepositoryAdapter; update documentation.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
docs/zh/task/agent-skill.md	Updates ZH documentation for code execution builder options and custom skill prompts.
docs/en/task/agent-skill.md	Updates EN documentation for code execution builder options and custom skill prompts.
docs/_toc.yml	Adjusts docs navigation entry for the ZH “agent-skill” page.
agentscope-core/src/test/resources/e2e-skills/log-parser/scripts/extract_errors.py	Adds benchmark script for log parsing/code execution recall scenarios.
agentscope-core/src/test/resources/e2e-skills/log-parser/SKILL.md	Adds benchmark skill metadata/doc for `log-parser`.
agentscope-core/src/test/resources/e2e-skills/image-resize/scripts/resize.py	Adds benchmark script for distractor resistance scenarios.
agentscope-core/src/test/resources/e2e-skills/image-resize/SKILL.md	Adds benchmark skill metadata/doc for `image-resize`.
agentscope-core/src/test/resources/e2e-skills/git-changelog/scripts/generate_changelog.py	Adds benchmark script for code execution recall scenarios.
agentscope-core/src/test/resources/e2e-skills/git-changelog/SKILL.md	Adds benchmark skill metadata/doc for `git-changelog`.
agentscope-core/src/test/resources/e2e-skills/data-transform/scripts/json_to_csv.py	Adds benchmark script for semantic discrimination scenarios.
agentscope-core/src/test/resources/e2e-skills/data-transform/scripts/csv_to_json.py	Adds benchmark script for semantic discrimination scenarios.
agentscope-core/src/test/resources/e2e-skills/data-transform/SKILL.md	Adds benchmark skill metadata/doc for `data-transform`.
agentscope-core/src/test/resources/e2e-skills/data-report/scripts/summarize.py	Adds benchmark script for semantic discrimination scenarios.
agentscope-core/src/test/resources/e2e-skills/data-report/SKILL.md	Adds benchmark skill metadata/doc for `data-report`.
agentscope-core/src/test/java/io/agentscope/core/skill/SkillBoxTest.java	Adds integration tests ensuring prompt injection behavior when code execution is enabled.
agentscope-core/src/test/java/io/agentscope/core/skill/AgentSkillPromptProviderTest.java	Adds unit tests for code execution prompt section behavior and customization.
agentscope-core/src/test/java/io/agentscope/core/e2e/SkillE2ETest.java	Adds concurrent-gated E2E benchmark tests for skill recall + code execution recall.
agentscope-core/src/main/java/io/agentscope/core/skill/util/JarSkillRepositoryAdapter.java	Removes deprecated adapter (superseded by `ClasspathSkillRepository`).
agentscope-core/src/main/java/io/agentscope/core/skill/SkillBox.java	Removes deprecated no-arg constructor; wires code-exec prompt settings from builder.
agentscope-core/src/main/java/io/agentscope/core/skill/AgentSkillPromptProvider.java	Implements conditional code execution section injection + customization plumbing.

agentscope-core/src/main/java/io/agentscope/core/skill/AgentSkillPromptProvider.java

agentscope-core/src/test/java/io/agentscope/core/e2e/SkillE2ETest.java

agentscope-core/src/main/java/io/agentscope/core/skill/SkillBox.java

agentscope-core/src/test/java/io/agentscope/core/e2e/SkillE2ETest.java

docs/_toc.yml

agentscope-core/src/main/java/io/agentscope/core/skill/AgentSkillPromptProvider.java

…chmark scripts - Fix AgentSkillPromptProviderTest: update assertion from "existing scripts" to "existing Python script" to match actual prompt template text - Add Apache 2.0 license headers to all 6 e2e benchmark Python scripts Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-03-28T14:04:47Z

Codecov Report

❌ Patch coverage is 85.71429% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...gentscope/core/skill/AgentSkillPromptProvider.java	85.71%	0 Missing and 2 partials ⚠️
...c/main/java/io/agentscope/core/skill/SkillBox.java	85.71%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

fang-tech and others added 2 commits March 28, 2026 19:45

fang-tech requested review from a team and Copilot March 28, 2026 11:52

Copilot started reviewing on behalf of fang-tech March 28, 2026 11:53 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

fang-tech and others added 4 commits March 28, 2026 20:18

apply copilot suggestions

1a13dca

apply copilot suggestions

e69294d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skill): optimize code exec prompt and add e2e benchmark#1056

feat(skill): optimize code exec prompt and add e2e benchmark#1056
fang-tech wants to merge 6 commits intoagentscope-ai:mainfrom
fang-tech:feat/skill-code-exec-prompt-e2e-benchmark

fang-tech commented Mar 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fang-tech commented Mar 28, 2026

AgentScope-Java Version

Description

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 28, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants