Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
bebbd42
add function type check
pan-x-c Dec 26, 2025
ed5a85a
add agentscope tuner v1 interface
pan-x-c Dec 26, 2025
81d8d62
update tuner interface
pan-x-c Dec 26, 2025
f451bcc
add missing file
pan-x-c Dec 26, 2025
fea6b96
fix pre-commit
pan-x-c Dec 26, 2025
bebf343
finish agentscope tune v1 interface
pan-x-c Dec 29, 2025
59b8deb
fix readme
pan-x-c Dec 29, 2025
657f7ec
rename to tuner
pan-x-c Dec 29, 2025
a8c6198
fix example
pan-x-c Dec 29, 2025
435cac4
update readme
pan-x-c Dec 30, 2025
8a1e03b
fix readme
pan-x-c Dec 30, 2025
3326838
fix eval tasksets
pan-x-c Dec 30, 2025
1a4526b
use tuner model
pan-x-c Dec 30, 2025
ceca3b3
fix pre-commit
pan-x-c Dec 30, 2025
789de86
update readme
pan-x-c Dec 30, 2025
68417bc
fix model type
pan-x-c Dec 30, 2025
f22eef0
refactor structure
pan-x-c Jan 4, 2026
8321195
fix doc
pan-x-c Jan 4, 2026
ea0f7c5
update comments
pan-x-c Jan 5, 2026
65b404f
fix comments
pan-x-c Jan 5, 2026
a338b38
add function type check
pan-x-c Jan 5, 2026
e2396a1
add unittests
pan-x-c Jan 5, 2026
55e7e4f
fix pre-commit
pan-x-c Jan 5, 2026
f2e7322
fix missing eval workflow args
pan-x-c Jan 5, 2026
11231b9
fix workflow args
pan-x-c Jan 5, 2026
7837814
fix reponse signature
pan-x-c Jan 5, 2026
423f9dc
fix comments
pan-x-c Jan 5, 2026
d20b08d
auto setup cluster on dlc
pan-x-c Jan 6, 2026
baaf0a7
fix dlc setup
pan-x-c Jan 6, 2026
b36805c
add tuner tutorial
pan-x-c Jan 8, 2026
68bec87
add tuner in tutorial
pan-x-c Jan 8, 2026
a631d53
add chinese doc
pan-x-c Jan 8, 2026
cf426cd
fix docs
pan-x-c Jan 8, 2026
907924a
clean code
pan-x-c Jan 8, 2026
4e8c373
add reward curve
pan-x-c Jan 8, 2026
6d0963c
fix comments
pan-x-c Jan 8, 2026
f9ceb44
fix missing packages
pan-x-c Jan 9, 2026
ed40269
move doc from training to tuner
pan-x-c Jan 9, 2026
631ec36
add tips for metrics
pan-x-c Jan 9, 2026
db37537
add tips
pan-x-c Jan 9, 2026
58643e6
fix chinese doc
pan-x-c Jan 9, 2026
9ebc8ed
fix en doc
pan-x-c Jan 9, 2026
4251087
add links to samples
pan-x-c Jan 9, 2026
ec83c38
fix comments
pan-x-c Jan 9, 2026
64fb8b5
fix comments
pan-x-c Jan 9, 2026
1f0429f
fix algorithm doc
pan-x-c Jan 9, 2026
3163b94
fix comments
pan-x-c Jan 9, 2026
99fe36d
fix type doc
pan-x-c Jan 9, 2026
5970355
fix comments
pan-x-c Jan 9, 2026
b274cd8
fix eval workflow_args
pan-x-c Jan 12, 2026
f2e27ed
Merge branch 'main' into feature/tuner_enhance
pan-x-c Jan 12, 2026
03e7733
fix dependencies
pan-x-c Jan 13, 2026
ef1461a
fix comments
pan-x-c Jan 13, 2026
6ff7f85
rename modules
pan-x-c Jan 13, 2026
4b2d9e9
fix pre-commit
pan-x-c Jan 13, 2026
3652692
remove template yaml
pan-x-c Jan 13, 2026
1a5d036
fix conflict
pan-x-c Jan 13, 2026
a873ace
fix config
pan-x-c Jan 13, 2026
ad23a5d
fix pre-commit
pan-x-c Jan 13, 2026
0dc0f01
fix pre-commit
pan-x-c Jan 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ Now our examples are organized into subdirectories based on their type:
- `examples/game/` for game-related examples
- `examples/evaluation/` for evaluation scripts
- `examples/workflows/` for workflow demonstrations
- `examples/training/` for training-related examples
- `examples/tuner/` for tuning-related examples

An example structure could be:

Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ examples/
- `examples/functionality/` 用于展示 AgentScope 的特定基础功能
- `examples/evaluation/` 用于评估
- `examples/workflows/` 用于工作流演示
- `examples/training/` 用于训练相关示例
- `examples/tuner/` 用于微调相关示例

示例结构如下:

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
- **[2025-12]** AgentScope supports [TTS(Text-to-Speech)](https://doc.agentscope.io/tutorial/task_tts.html) now! Check our [example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/tts) and [tutorial](https://doc.agentscope.io/tutorial/task_tts.html) for more details.
- **[2025-11]** AgentScope supports [Anthropic Agent Skill](https://claude.com/blog/skills) now! Check our [example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) and [tutorial](https://doc.agentscope.io/tutorial/task_agent_skill.html) for more details.
- **[2025-11]** AgentScope open-sources [Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) for diverse real-world tasks and [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent) for data processing.
- **[2025-11]** AgentScope supports [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent) via integrating [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) library.
- **[2025-11]** AgentScope supports [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent) via integrating [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) library.
- **[2025-11]** AgentScope integrates [ReMe](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/long_term_memory/reme) for enhanced long-term memory.
- **[2025-11]** AgentScope launches [agentscope-samples](https://github.com/agentscope-ai/agentscope-samples) repository and upgrades [agentscope-runtime](https://github.com/agentscope-ai/agentscope-runtime) with Docker/K8s deployment and VNC-powered GUI sandboxes.
- **[2025-11]** [Contributing Guide](./CONTRIBUTING.md) is online now! Welcome to contribute to AgentScope.
Expand Down Expand Up @@ -411,8 +411,8 @@ as_studio
- [Multi-agent Concurrent](https://github.com/agentscope-ai/agentscope/tree/main/examples/workflows/multiagent_concurrent)
- Evaluation
- [ACEBench](https://github.com/agentscope-ai/agentscope/tree/main/examples/evaluation/ace_bench)
- Training
- [Reinforcement learning (RL) with Trinity-RFT](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent)
- Tuner
- [Tune ReAct Agent](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent)


## 🤝 Contributing
Expand Down
6 changes: 3 additions & 3 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
- **[2025-12]** AgentScope 已支持 [TTS(Text-to-Speech) 模型](https://doc.agentscope.io/zh_CN/tutorial/task_tts.html) !欢迎查看 [样例]() 和 [教程](https://doc.agentscope.io/zh_CN/tutorial/task_tts.html) 了解更多详情。
- **[2025-11]** AgentScope 已支持 [Anthropic Agent Skill](https://claude.com/blog/skills) !欢迎查看 [样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) 和 [教程](https://doc.agentscope.io/zh_CN/tutorial/task_agent_skill.html) 了解更多详情。
- **[2025-11]** AgentScope 开源 [Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) 用于处理多样化的真实任务,以及 [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent) 用于自然语言驱动的数据处理。
- **[2025-11]** AgentScope 通过集成 [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) 实现对 [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent) 的支持。
- **[2025-11]** AgentScope 通过集成 [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) 实现对 [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent) 的支持。
- **[2025-11]** AgentScope 集成 [ReMe](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/long_term_memory/reme) 增强长期记忆能力。
- **[2025-11]** AgentScope 推出 [agentscope-samples](https://github.com/agentscope-ai/agentscope-samples) 样例库,并升级 [agentscope-runtime](https://github.com/agentscope-ai/agentscope-runtime) 支持 Docker/K8s 部署和 VNC 驱动的图形化沙盒。
- **[2025-11]** [Contributing Guide](./CONTRIBUTING.md) 已更新,欢迎贡献到 AgentScope!
Expand Down Expand Up @@ -412,8 +412,8 @@ as_studio
- [多智能体并发](https://github.com/agentscope-ai/agentscope/tree/main/examples/workflows/multiagent_concurrent)
- 评测
- [ACEBench](https://github.com/agentscope-ai/agentscope/tree/main/examples/evaluation/ace_bench)
- 训练
- [使用 Trinity-RFT 进行强化学习训练](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent)
- 微调
- [微调 ReAct 智能体](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent)


## 🤝 贡献
Expand Down
1 change: 1 addition & 0 deletions docs/tutorial/en/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ Welcome to AgentScope's documentation!
tutorial/task_eval
tutorial/task_embedding
tutorial/task_tts
tutorial/task_tuner

.. toctree::
:maxdepth: 1
Expand Down
247 changes: 247 additions & 0 deletions docs/tutorial/en/src/task_tuner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
# -*- coding: utf-8 -*-
"""
.. _tuner:

Tuner
=================

AgentScope provides the ``tuner`` module for training agent applications using reinforcement learning (RL).
This tutorial will guide you through how to leverage the ``tuner`` module to improve agent performance on specific tasks, including:

- Introducing the core components of the ``tuner`` module
- Demonstrating the key code required for the tuning workflow
- Showing how to configure and run the tuning process

Main Components
~~~~~~~~~~~~~~~~~~~
The ``tuner`` module introduces three core components essential for RL-based agent training:

- **Task Dataset**: A collection of tasks for training and evaluating the agent.
- **Workflow Function**: Encapsulates the agent's logic to be tuned.
- **Judge Function**: Evaluates the agent's performance on tasks and provides reward signals for tuning.

In addition, ``tuner`` provides several configuration classes for customizing the tuning process, including:

- **TunerModelConfig**: Model configurations for tuning purposes.
- **AlgorithmConfig**: Specifies the RL algorithm (e.g., GRPO, PPO) and its parameters.

Implementation
~~~~~~~~~~~~~~~~~~~
This section demonstrates how to use ``tuner`` to train a simple math agent.

Task Dataset
--------------------
The task dataset contains tasks for training and evaluating your agent.

You dataset should follow the Huggingface `datasets <https://huggingface.co/docs/datasets/quickstart>`_ format, which can be loaded with ``datasets.load_dataset``. For example:

.. code-block:: text

my_dataset/
├── train.jsonl # training samples
└── test.jsonl # evaluation samples

Suppose your `train.jsonl` contains:

.. code-block:: json

{"question": "What is 2 + 2?", "answer": "4"}
{"question": "What is 4 + 4?", "answer": "8"}

Before starting tuning, you can verify that your dataset is loaded correctly with:

.. code-block:: python

from agentscope.tuner import DatasetConfig

dataset = DatasetConfig(path="my_dataset", split="train")
dataset.preview(n=2)
# Output the first two samples to verify correct loading
# [
# {
# "question": "What is 2 + 2?",
# "answer": "4"
# },
# {
# "question": "What is 4 + 4?",
# "answer": "8"
# }
# ]

Workflow Function
--------------------
The workflow function defines how the agent interacts with the environment and makes decisions. All workflow functions should follow the input/output signature defined in ``agentscope.tuner.WorkflowType``.

Below is an example workflow function using a ReAct agent to answer math questions:
"""

from typing import Dict, Optional
from agentscope.agent import ReActAgent
from agentscope.formatter import OpenAIChatFormatter
from agentscope.message import Msg
from agentscope.model import ChatModelBase
from agentscope.tuner import WorkflowOutput


async def example_workflow_function(
task: Dict,
model: ChatModelBase,
auxiliary_models: Optional[Dict[str, ChatModelBase]] = None,
) -> WorkflowOutput:
"""An example workflow function for tuning.

Args:
task (`Dict`): The task information.
model (`ChatModelBase`): The chat model used by the agent.
auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional
chat models, generally used to simulate the behavior of other
non-training agents in multi-agent scenarios.

Returns:
`WorkflowOutput`: The output generated by the workflow.
"""
agent = ReActAgent(
name="react_agent",
sys_prompt="You are a helpful math problem solving agent.",
model=model,
formatter=OpenAIChatFormatter(),
)

response = await agent.reply(
msg=Msg(
"user",
task["question"],
role="user",
), # extract question from task
)

return WorkflowOutput( # return the response
response=response,
)


# %%
# You can directly run this workflow function with a task dictionary and a ``DashScopeChatModel`` / ``OpenAIChatModel`` to test its correctness before formal training. For example:

import asyncio
import os
from agentscope.model import DashScopeChatModel

task = {"question": "What is 123 plus 456?", "answer": "579"}
model = DashScopeChatModel(
model_name="qwen-max",
api_key=os.environ["DASHSCOPE_API_KEY"],
)
workflow_output = asyncio.run(example_workflow_function(task, model))
assert isinstance(
workflow_output.response,
Msg,
), "In this example, the response should be a Msg instance."
print("\nWorkflow response:", workflow_output.response.get_text_content())

# %%
#
# Judge Function
# --------------------
# The judge function evaluates the agent's performance on a given task and provides a reward signal for tuning.
# All judge functions should follow the input/output signature defined in ``agentscope.tuner.JudgeType``.
# Below is a simple judge function that compares the agent's response with the ground truth answer:

from typing import Any
from agentscope.tuner import JudgeOutput


async def example_judge_function(
task: Dict,
response: Any,
auxiliary_models: Optional[Dict[str, ChatModelBase]] = None,
) -> JudgeOutput:
"""A very simple judge function only for demonstration.

Args:
task (`Dict`): The task information.
response (`Any`): The response field from the WorkflowOutput.
auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional
chat models for LLM-as-a-Judge purpose.
Returns:
`JudgeOutput`: The reward assigned by the judge.
"""
ground_truth = task["answer"]
reward = 1.0 if ground_truth in response.get_text_content() else 0.0
return JudgeOutput(reward=reward)


judge_output = asyncio.run(
example_judge_function(
task,
workflow_output.response,
),
)
print(f"Judge reward: {judge_output.reward}")

# %%
# The judge function can also be locally tested in the same way as shown above before formal training to ensure its logic is correct.
#
# .. tip::
# You can leverage existing `MetricBase <https://github.com/agentscope-ai/agentscope/blob/main/src/agentscope/evaluate/_metric_base.py>`_ implementations in your judge function to compute more sophisticated metrics and combine them into a composite reward.
#
# Configuration and Running
# ~~~~~~~~~~~~~~~
# Finally, you can configure and run the tuning process using the ``tuner`` module.
# Before starting, ensure that `Trinity-RFT <https://github.com/modelscope/Trinity-RFT>`_ is installed in your environment, as it is required for tuning.
#
# Below is an example of configuring and starting the tuning process:
#
# .. note::
# This example is for demonstration only. For a complete runnable example, see `Tune ReActAgent <https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent>`_
#
# .. code-block:: python
#
# from agentscope.tuner import tune, AlgorithmConfig, DatasetConfig, TunerModelConfig
# # your workflow / judge function here...
#
# if __name__ == "__main__":
# dataset = DatasetConfig(path="my_dataset", split="train")
# model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384)
# algorithm = AlgorithmConfig(
# algorithm_type="multi_step_grpo",
# group_size=8,
# batch_size=32,
# learning_rate=1e-6,
# )
# tune(
# workflow_func=example_workflow_function,
# judge_func=example_judge_function,
# model=model,
# train_dataset=dataset,
# algorithm=algorithm,
# )
#
# Here, ``DatasetConfig`` configures the training dataset, ``TunerModelConfig`` sets the parameters for the trainable model, and ``AlgorithmConfig`` specifies the reinforcement learning algorithm and its hyperparameters.
#
# .. tip::
# The ``tune`` function is based on `Trinity-RFT <https://github.com/modelscope/Trinity-RFT>`_ and internally converts input parameters to a YAML configuration.
# Advanced users can skip the ``model``, ``train_dataset``, and ``algorithm`` arguments and instead provide a YAML config file path via the ``config_path`` argument.
# Using a configuration file is recommended for fine-grained control and to leverage advanced Trinity-RFT features. See the Trinity-RFT `Configuration Guide <https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html>`_ for more options.
#
# Save the above code as ``main.py`` and run it with:
#
# .. code-block:: bash
#
# ray start --head
# python main.py
#
# Checkpoints and logs are automatically saved to the ``checkpoints/AgentScope`` directory under your workspace, with each run in a timestamped sub-directory. Tensorboard logs can be found in ``monitor/tensorboard`` within the checkpoint directory.
#
# .. code-block:: text
#
# your_workspace/
# └── checkpoints/
# └──AgentScope/
# └── Experiment-20260104185355/ # each run saved in a sub-directory with timestamp
# ├── monitor/
# │ └── tensorboard/ # tensorboard logs
# └── global_step_x/ # saved model checkpoints at step x
#
# .. tip::
# For more tuning examples, refer to the `tuner directory <https://github.com/agentscope-ai/agentscope-samples/tree/main/tuner>`_ of the AgentScope-Samples repository.
1 change: 1 addition & 0 deletions docs/tutorial/zh_CN/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ Welcome to AgentScope's documentation!
tutorial/task_eval
tutorial/task_embedding
tutorial/task_tts
tutorial/task_tuner

.. toctree::
:maxdepth: 1
Expand Down
Loading