diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index ba8f4b686b..08efd91167 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -195,7 +195,7 @@ Now our examples are organized into subdirectories based on their type: - `examples/game/` for game-related examples - `examples/evaluation/` for evaluation scripts - `examples/workflows/` for workflow demonstrations -- `examples/training/` for training-related examples +- `examples/tuner/` for tuning-related examples An example structure could be: diff --git a/CONTRIBUTING_zh.md b/CONTRIBUTING_zh.md index 5e52727fdf..53fcfa4b5f 100644 --- a/CONTRIBUTING_zh.md +++ b/CONTRIBUTING_zh.md @@ -189,7 +189,7 @@ examples/ - `examples/functionality/` 用于展示 AgentScope 的特定基础功能 - `examples/evaluation/` 用于评估 - `examples/workflows/` 用于工作流演示 -- `examples/training/` 用于训练相关示例 +- `examples/tuner/` 用于微调相关示例 示例结构如下: diff --git a/README.md b/README.md index 4067a7affa..b315d2bf69 100644 --- a/README.md +++ b/README.md @@ -59,7 +59,7 @@ - **[2025-12]** AgentScope supports [TTS(Text-to-Speech)](https://doc.agentscope.io/tutorial/task_tts.html) now! Check our [example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/tts) and [tutorial](https://doc.agentscope.io/tutorial/task_tts.html) for more details. - **[2025-11]** AgentScope supports [Anthropic Agent Skill](https://claude.com/blog/skills) now! Check our [example](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) and [tutorial](https://doc.agentscope.io/tutorial/task_agent_skill.html) for more details. - **[2025-11]** AgentScope open-sources [Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) for diverse real-world tasks and [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent) for data processing. -- **[2025-11]** AgentScope supports [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent) via integrating [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) library. +- **[2025-11]** AgentScope supports [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent) via integrating [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) library. - **[2025-11]** AgentScope integrates [ReMe](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/long_term_memory/reme) for enhanced long-term memory. - **[2025-11]** AgentScope launches [agentscope-samples](https://github.com/agentscope-ai/agentscope-samples) repository and upgrades [agentscope-runtime](https://github.com/agentscope-ai/agentscope-runtime) with Docker/K8s deployment and VNC-powered GUI sandboxes. - **[2025-11]** [Contributing Guide](./CONTRIBUTING.md) is online now! Welcome to contribute to AgentScope. @@ -411,8 +411,8 @@ as_studio - [Multi-agent Concurrent](https://github.com/agentscope-ai/agentscope/tree/main/examples/workflows/multiagent_concurrent) - Evaluation - [ACEBench](https://github.com/agentscope-ai/agentscope/tree/main/examples/evaluation/ace_bench) - - Training - - [Reinforcement learning (RL) with Trinity-RFT](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent) + - Tuner + - [Tune ReAct Agent](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent) ## 🤝 Contributing diff --git a/README_zh.md b/README_zh.md index c3f8d1f6b4..34a71a5d41 100644 --- a/README_zh.md +++ b/README_zh.md @@ -59,7 +59,7 @@ - **[2025-12]** AgentScope 已支持 [TTS(Text-to-Speech) 模型](https://doc.agentscope.io/zh_CN/tutorial/task_tts.html) !欢迎查看 [样例]() 和 [教程](https://doc.agentscope.io/zh_CN/tutorial/task_tts.html) 了解更多详情。 - **[2025-11]** AgentScope 已支持 [Anthropic Agent Skill](https://claude.com/blog/skills) !欢迎查看 [样例](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/agent_skill) 和 [教程](https://doc.agentscope.io/zh_CN/tutorial/task_agent_skill.html) 了解更多详情。 - **[2025-11]** AgentScope 开源 [Alias-Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias) 用于处理多样化的真实任务,以及 [Data-Juicer Agent](https://github.com/agentscope-ai/agentscope-samples/tree/main/data_juicer_agent) 用于自然语言驱动的数据处理。 -- **[2025-11]** AgentScope 通过集成 [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) 实现对 [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent) 的支持。 +- **[2025-11]** AgentScope 通过集成 [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) 实现对 [Agentic RL](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent) 的支持。 - **[2025-11]** AgentScope 集成 [ReMe](https://github.com/agentscope-ai/agentscope/tree/main/examples/functionality/long_term_memory/reme) 增强长期记忆能力。 - **[2025-11]** AgentScope 推出 [agentscope-samples](https://github.com/agentscope-ai/agentscope-samples) 样例库,并升级 [agentscope-runtime](https://github.com/agentscope-ai/agentscope-runtime) 支持 Docker/K8s 部署和 VNC 驱动的图形化沙盒。 - **[2025-11]** [Contributing Guide](./CONTRIBUTING.md) 已更新,欢迎贡献到 AgentScope! @@ -412,8 +412,8 @@ as_studio - [多智能体并发](https://github.com/agentscope-ai/agentscope/tree/main/examples/workflows/multiagent_concurrent) - 评测 - [ACEBench](https://github.com/agentscope-ai/agentscope/tree/main/examples/evaluation/ace_bench) - - 训练 - - [使用 Trinity-RFT 进行强化学习训练](https://github.com/agentscope-ai/agentscope/tree/main/examples/training/react_agent) + - 微调 + - [微调 ReAct 智能体](https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent) ## 🤝 贡献 diff --git a/docs/tutorial/en/index.rst b/docs/tutorial/en/index.rst index cab201aba0..332af8fa7f 100644 --- a/docs/tutorial/en/index.rst +++ b/docs/tutorial/en/index.rst @@ -70,6 +70,7 @@ Welcome to AgentScope's documentation! tutorial/task_eval tutorial/task_embedding tutorial/task_tts + tutorial/task_tuner .. toctree:: :maxdepth: 1 diff --git a/docs/tutorial/en/src/task_tuner.py b/docs/tutorial/en/src/task_tuner.py new file mode 100644 index 0000000000..50713df5e3 --- /dev/null +++ b/docs/tutorial/en/src/task_tuner.py @@ -0,0 +1,247 @@ +# -*- coding: utf-8 -*- +""" +.. _tuner: + +Tuner +================= + +AgentScope provides the ``tuner`` module for training agent applications using reinforcement learning (RL). +This tutorial will guide you through how to leverage the ``tuner`` module to improve agent performance on specific tasks, including: + +- Introducing the core components of the ``tuner`` module +- Demonstrating the key code required for the tuning workflow +- Showing how to configure and run the tuning process + +Main Components +~~~~~~~~~~~~~~~~~~~ +The ``tuner`` module introduces three core components essential for RL-based agent training: + +- **Task Dataset**: A collection of tasks for training and evaluating the agent. +- **Workflow Function**: Encapsulates the agent's logic to be tuned. +- **Judge Function**: Evaluates the agent's performance on tasks and provides reward signals for tuning. + +In addition, ``tuner`` provides several configuration classes for customizing the tuning process, including: + +- **TunerModelConfig**: Model configurations for tuning purposes. +- **AlgorithmConfig**: Specifies the RL algorithm (e.g., GRPO, PPO) and its parameters. + +Implementation +~~~~~~~~~~~~~~~~~~~ +This section demonstrates how to use ``tuner`` to train a simple math agent. + +Task Dataset +-------------------- +The task dataset contains tasks for training and evaluating your agent. + +You dataset should follow the Huggingface `datasets `_ format, which can be loaded with ``datasets.load_dataset``. For example: + +.. code-block:: text + + my_dataset/ + ├── train.jsonl # training samples + └── test.jsonl # evaluation samples + +Suppose your `train.jsonl` contains: + +.. code-block:: json + + {"question": "What is 2 + 2?", "answer": "4"} + {"question": "What is 4 + 4?", "answer": "8"} + +Before starting tuning, you can verify that your dataset is loaded correctly with: + +.. code-block:: python + + from agentscope.tuner import DatasetConfig + + dataset = DatasetConfig(path="my_dataset", split="train") + dataset.preview(n=2) + # Output the first two samples to verify correct loading + # [ + # { + # "question": "What is 2 + 2?", + # "answer": "4" + # }, + # { + # "question": "What is 4 + 4?", + # "answer": "8" + # } + # ] + +Workflow Function +-------------------- +The workflow function defines how the agent interacts with the environment and makes decisions. All workflow functions should follow the input/output signature defined in ``agentscope.tuner.WorkflowType``. + +Below is an example workflow function using a ReAct agent to answer math questions: +""" + +from typing import Dict, Optional +from agentscope.agent import ReActAgent +from agentscope.formatter import OpenAIChatFormatter +from agentscope.message import Msg +from agentscope.model import ChatModelBase +from agentscope.tuner import WorkflowOutput + + +async def example_workflow_function( + task: Dict, + model: ChatModelBase, + auxiliary_models: Optional[Dict[str, ChatModelBase]] = None, +) -> WorkflowOutput: + """An example workflow function for tuning. + + Args: + task (`Dict`): The task information. + model (`ChatModelBase`): The chat model used by the agent. + auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional + chat models, generally used to simulate the behavior of other + non-training agents in multi-agent scenarios. + + Returns: + `WorkflowOutput`: The output generated by the workflow. + """ + agent = ReActAgent( + name="react_agent", + sys_prompt="You are a helpful math problem solving agent.", + model=model, + formatter=OpenAIChatFormatter(), + ) + + response = await agent.reply( + msg=Msg( + "user", + task["question"], + role="user", + ), # extract question from task + ) + + return WorkflowOutput( # return the response + response=response, + ) + + +# %% +# You can directly run this workflow function with a task dictionary and a ``DashScopeChatModel`` / ``OpenAIChatModel`` to test its correctness before formal training. For example: + +import asyncio +import os +from agentscope.model import DashScopeChatModel + +task = {"question": "What is 123 plus 456?", "answer": "579"} +model = DashScopeChatModel( + model_name="qwen-max", + api_key=os.environ["DASHSCOPE_API_KEY"], +) +workflow_output = asyncio.run(example_workflow_function(task, model)) +assert isinstance( + workflow_output.response, + Msg, +), "In this example, the response should be a Msg instance." +print("\nWorkflow response:", workflow_output.response.get_text_content()) + +# %% +# +# Judge Function +# -------------------- +# The judge function evaluates the agent's performance on a given task and provides a reward signal for tuning. +# All judge functions should follow the input/output signature defined in ``agentscope.tuner.JudgeType``. +# Below is a simple judge function that compares the agent's response with the ground truth answer: + +from typing import Any +from agentscope.tuner import JudgeOutput + + +async def example_judge_function( + task: Dict, + response: Any, + auxiliary_models: Optional[Dict[str, ChatModelBase]] = None, +) -> JudgeOutput: + """A very simple judge function only for demonstration. + + Args: + task (`Dict`): The task information. + response (`Any`): The response field from the WorkflowOutput. + auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional + chat models for LLM-as-a-Judge purpose. + Returns: + `JudgeOutput`: The reward assigned by the judge. + """ + ground_truth = task["answer"] + reward = 1.0 if ground_truth in response.get_text_content() else 0.0 + return JudgeOutput(reward=reward) + + +judge_output = asyncio.run( + example_judge_function( + task, + workflow_output.response, + ), +) +print(f"Judge reward: {judge_output.reward}") + +# %% +# The judge function can also be locally tested in the same way as shown above before formal training to ensure its logic is correct. +# +# .. tip:: +# You can leverage existing `MetricBase `_ implementations in your judge function to compute more sophisticated metrics and combine them into a composite reward. +# +# Configuration and Running +# ~~~~~~~~~~~~~~~ +# Finally, you can configure and run the tuning process using the ``tuner`` module. +# Before starting, ensure that `Trinity-RFT `_ is installed in your environment, as it is required for tuning. +# +# Below is an example of configuring and starting the tuning process: +# +# .. note:: +# This example is for demonstration only. For a complete runnable example, see `Tune ReActAgent `_ +# +# .. code-block:: python +# +# from agentscope.tuner import tune, AlgorithmConfig, DatasetConfig, TunerModelConfig +# # your workflow / judge function here... +# +# if __name__ == "__main__": +# dataset = DatasetConfig(path="my_dataset", split="train") +# model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384) +# algorithm = AlgorithmConfig( +# algorithm_type="multi_step_grpo", +# group_size=8, +# batch_size=32, +# learning_rate=1e-6, +# ) +# tune( +# workflow_func=example_workflow_function, +# judge_func=example_judge_function, +# model=model, +# train_dataset=dataset, +# algorithm=algorithm, +# ) +# +# Here, ``DatasetConfig`` configures the training dataset, ``TunerModelConfig`` sets the parameters for the trainable model, and ``AlgorithmConfig`` specifies the reinforcement learning algorithm and its hyperparameters. +# +# .. tip:: +# The ``tune`` function is based on `Trinity-RFT `_ and internally converts input parameters to a YAML configuration. +# Advanced users can skip the ``model``, ``train_dataset``, and ``algorithm`` arguments and instead provide a YAML config file path via the ``config_path`` argument. +# Using a configuration file is recommended for fine-grained control and to leverage advanced Trinity-RFT features. See the Trinity-RFT `Configuration Guide `_ for more options. +# +# Save the above code as ``main.py`` and run it with: +# +# .. code-block:: bash +# +# ray start --head +# python main.py +# +# Checkpoints and logs are automatically saved to the ``checkpoints/AgentScope`` directory under your workspace, with each run in a timestamped sub-directory. Tensorboard logs can be found in ``monitor/tensorboard`` within the checkpoint directory. +# +# .. code-block:: text +# +# your_workspace/ +# └── checkpoints/ +# └──AgentScope/ +# └── Experiment-20260104185355/ # each run saved in a sub-directory with timestamp +# ├── monitor/ +# │ └── tensorboard/ # tensorboard logs +# └── global_step_x/ # saved model checkpoints at step x +# +# .. tip:: +# For more tuning examples, refer to the `tuner directory `_ of the AgentScope-Samples repository. diff --git a/docs/tutorial/zh_CN/index.rst b/docs/tutorial/zh_CN/index.rst index 7e8683facb..335377b15e 100644 --- a/docs/tutorial/zh_CN/index.rst +++ b/docs/tutorial/zh_CN/index.rst @@ -71,6 +71,7 @@ Welcome to AgentScope's documentation! tutorial/task_eval tutorial/task_embedding tutorial/task_tts + tutorial/task_tuner .. toctree:: :maxdepth: 1 diff --git a/docs/tutorial/zh_CN/src/task_tuner.py b/docs/tutorial/zh_CN/src/task_tuner.py new file mode 100644 index 0000000000..9fbc98f076 --- /dev/null +++ b/docs/tutorial/zh_CN/src/task_tuner.py @@ -0,0 +1,249 @@ +# -*- coding: utf-8 -*- +""" +.. _tuner: + +Tuner +================= + +AgentScope 提供了 ``tuner`` 模块,用于通过强化学习(RL)训练智能体应用。 +本教程将带你系统了解如何利用 ``tuner`` 提升智能体在特定任务上的表现,包括: + +- 介绍 ``tuner`` 的核心组件 +- 演示调优流程所需的关键代码实现 +- 展示调优流程的配置与运行方法 + +主要组件 +~~~~~~~~~~~~~~~~~~~ +``tuner`` 模块为智能体训练工作流引入了三大核心组件: + +- **任务数据集**:用于训练和评估智能体的任务集合。 +- **工作流函数**:封装被调优智能体应用的决策逻辑。 +- **评判函数**:评估智能体在特定任务上的表现,并为调优过程提供奖励信号。 + +此外,``tuner`` 还提供了若干用于自定义调优流程的配置类,包括: + +- **TunerModelConfig**:用于指定被调优模型的相关配置。 +- **AlgorithmConfig**:用于指定强化学习算法(如 GRPO、PPO 等)及其参数。 + +实现流程 +~~~~~~~~~~~~~~~~~~~ +本节以一个简单的数学智能体为例,演示如何用 ``tuner`` 进行训练。 + +任务数据集 +-------------------- +任务数据集包含用于训练和评估的任务集合。 + +``tuner`` 的任务数据集采用 Huggingface `datasets `_ 格式,并通过 ``datasets.load_dataset`` 加载。例如: + +.. code-block:: text + + my_dataset/ + ├── train.jsonl # 训练样本 + └── test.jsonl # 测试样本 + +假设 `train.jsonl` 内容如下: + +.. code-block:: json + + {"question": "2 + 2 等于多少?", "answer": "4"} + {"question": "4 + 4 等于多少?", "answer": "8"} + +在开始调优前,你可以用如下方法来确定你的数据集能够被正确加载: + +.. code-block:: python + + from agentscope.tuner import DatasetConfig + + dataset = DatasetConfig(path="my_dataset", split="train") + dataset.preview(n=2) + # 输出前两个样本以验证数据集加载正确 + # [ + # { + # "question": "2 + 2 等于多少?", + # "answer": "4" + # }, + # { + # "question": "4 + 4 等于多少?", + # "answer": "8" + # } + # ] + +工作流函数 +-------------------- +工作流函数定义了智能体与环境的交互方式和决策过程。所有工作流函数需遵循 ``agentscope.tuner.WorkflowType`` 的输入/输出签名。 + +以下是一个用 ReAct 智能体回答数学问题的简单工作流函数示例: +""" + +from typing import Dict, Optional +from agentscope.agent import ReActAgent +from agentscope.formatter import OpenAIChatFormatter +from agentscope.message import Msg +from agentscope.model import ChatModelBase +from agentscope.tuner import WorkflowOutput + + +async def example_workflow_function( + task: Dict, + model: ChatModelBase, + auxiliary_models: Optional[Dict[str, ChatModelBase]] = None, +) -> WorkflowOutput: + """一个用于调优的工作流函数示例。 + + Args: + task (`Dict`): 任务信息。 + model (`ChatModelBase`): 智能体使用的对话模型。 + auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): + 用于辅助的额外对话模型,一般用于多智能体场景下模拟其他非训练智能体的行为。 + + Returns: + `WorkflowOutput`: 工作流生成的输出。 + """ + agent = ReActAgent( + name="react_agent", + sys_prompt="你是一个善于解决数学问题的智能体。", + model=model, + formatter=OpenAIChatFormatter(), + ) + + response = await agent.reply( + msg=Msg( + "user", + task["question"], + role="user", + ), # 从任务中提取问题 + ) + + return WorkflowOutput( # 返回响应结果 + response=response, + ) + + +# %% +# 你可以直接用任务字典和日常调试使用的 ``DashScopeChatModel`` / ``OpenAIChatModel`` 运行此工作流函数,从而在正式训练前测试其流程的正确性。例如: + +import asyncio +import os +from agentscope.model import DashScopeChatModel + +task = {"question": "123 加 456 等于多少?", "answer": "579"} +model = DashScopeChatModel( + model_name="qwen-max", + api_key=os.environ["DASHSCOPE_API_KEY"], +) +workflow_output = asyncio.run(example_workflow_function(task, model)) +assert isinstance( + workflow_output.response, + Msg, +), "在此示例中,响应应为 Msg 实例。" +print("\n工作流响应:", workflow_output.response.get_text_content()) + +# %% +# +# 评判函数 +# -------------------- +# 评判函数用于评估智能体在特定任务上的表现,并为调优过程提供奖励信号。 +# 所有评判函数需遵循 ``agentscope.tuner.JudgeType`` 的输入/输出签名。 +# 下面是一个简单的评判函数示例,通过比较智能体响应与标准答案给出奖励: + +from typing import Any +from agentscope.tuner import JudgeOutput + + +async def example_judge_function( + task: Dict, + response: Any, + auxiliary_models: Optional[Dict[str, ChatModelBase]] = None, +) -> JudgeOutput: + """仅用于演示的简单评判函数。 + + Args: + task (`Dict`): 任务信息。 + response (`Any`): WorkflowOutput 的响应字段。 + auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): + 用于 LLM-as-a-Judge 的辅助模型。 + Returns: + `JudgeOutput`: 评判函数分配的奖励。 + """ + ground_truth = task["answer"] + reward = 1.0 if ground_truth in response.get_text_content() else 0.0 + return JudgeOutput(reward=reward) + + +# 本地测试函数的正确性: +judge_output = asyncio.run( + example_judge_function( + task, + workflow_output.response, + ), +) +print(f"评判奖励: {judge_output.reward}") + +# %% +# 评判函数同样可以按照上述案例中展示的方式在正式训练前进行本地测试,以确保其逻辑正确。 +# +# .. tip:: +# 你可以在评判函数中利用已有的 `MetricBase `_ 实现,计算更复杂的指标,并将其组合为复合奖励。 +# +# 配置并运行 +# ~~~~~~~~~~~~~~~ +# 最后,你可以用 ``tuner`` 模块配置并运行调优流程。 +# 在开始调优前,请确保环境已安装 `Trinity-RFT `_,这是 ``tuner`` 的依赖。 +# +# 下面是调优流程的配置与启动示例: +# +# .. note:: +# 此示例仅供演示。完整可运行示例请参考 `Tune ReActAgent `_ +# +# .. code-block:: python +# +# from agentscope.tuner import tune, AlgorithmConfig, DatasetConfig, TunerModelConfig +# # 你的工作流 / 评判函数 ... +# +# if __name__ == "__main__": +# dataset = DatasetConfig(path="my_dataset", split="train") +# model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384) +# algorithm = AlgorithmConfig( +# algorithm_type="multi_step_grpo", +# group_size=8, +# batch_size=32, +# learning_rate=1e-6, +# ) +# tune( +# workflow_func=example_workflow_function, +# judge_func=example_judge_function, +# model=model, +# train_dataset=dataset, +# algorithm=algorithm, +# ) +# +# 这里用 ``DatasetConfig`` 配置训练数据集,用 ``TunerModelConfig`` 配置可训练模型相关参数,用 ``AlgorithmConfig`` 指定强化学习算法及其超参数。 +# +# .. tip:: +# ``tune`` 函数基于 `Trinity-RFT `_ 实现,内部会将输入参数转换为 YAML 配置。 +# 高级用户可忽略 ``model``、``train_dataset``、``algorithm`` 参数,直接通过 ``config_path`` 指定 YAML 配置文件。 +# 推荐使用配置文件方式以便更细粒度地控制训练过程,充分利用 Trinity-RFT 的高级功能。 +# 你可参考 Trinity-RFT 的 `配置指南 `_ 了解更多配置选项。 +# +# 你可以将上述代码保存为 ``main.py``,并用如下命令运行: +# +# .. code-block:: bash +# +# ray start --head +# python main.py +# +# 检查点和日志会自动保存到当前工作目录下的 ``checkpoints/AgentScope`` 目录,每次运行会以时间戳为后缀保存到子目录。 +# tensorboard 日志可在检查点目录下的 ``monitor/tensorboard`` 中找到。 +# +# .. code-block:: text +# +# your_workspace/ +# └── checkpoints/ +# └──AgentScope/ +# └── Experiment-20260104185355/ # 每次运行以时间戳保存 +# ├── monitor/ +# │ └── tensorboard/ # tensorboard 日志 +# └── global_step_x/ # 第 x 步保存的模型检查点 +# +# .. tip:: +# 更多调优样例请参考 AgentScope-Samples 库中的 `tuner 目录 `_ diff --git a/examples/training/react_agent/README.md b/examples/training/react_agent/README.md deleted file mode 100644 index 57db3257b6..0000000000 --- a/examples/training/react_agent/README.md +++ /dev/null @@ -1,240 +0,0 @@ -# Training agent workflows with RL using Trinity-RFT - -AgentScope exposes a `tune` interface to train agent workflows using reinforcement learning (RL). -The `tune` interface leverages [Trinity-RFT](https://github.com/modelscope/Trinity-RFT), which supports training agents with minimal code changes. - ---- - -## How to implement - -Here we use a math problem solving scenario as an example to illustrate how to convert an existing agent workflow into a trainable workflow function. - -Suppose you have an agent workflow that solves math problems using the `ReActAgent` - -```python -from agentscope.agent import ReActAgent - -async def run_react_agent(): - # model = ... # Initialize your ChatModel here - - query = "What is the sum of the first 10 prime numbers?" - agent = ReActAgent( - name="react_agent", - sys_prompt="You are a helpful math problem solving agent.", - model=model, - enable_meta_tool=True, - formatter=OpenAIChatFormatter(), - ) - - response = await agent.reply( - msg=Msg("user", query, role="user"), - ) - - print(response) -``` - -### Step 1: Define a workflow function - -To train an agent workflow using RL, you need to implement a workflow function with the following signature. - -```python -def workflow_function( - task: Dict, - model: TrinityChatModel, -) -> float: - """Run the agent workflow on a single task and return a scalar reward.""" -``` - -Inputs: - -- `task`: A dictionary representing a single training task, converted from a sample in the training dataset. For example, in a math problem solving task, `task` may contain `question` and `answer` fields. - -- `model`: A `TrinityChatModel` instance, which has the same interface as `OpenAIChatModel`, but it supports automatically converting invoke history into trainable data that can be used by Trinity-RFT. - -Outputs: - -- A scalar reward (float) indicating the quality of the agent's response on the given task. - -### Step 2: Initialize and run the agent using the provided task and model - -Since the `model` has the same interface as `OpenAIChatModel`, you can directly use it to initialize the agent. - -However, the `task` dictionary is a sample from the training dataset and can vary. You need to extract the relevant fields from `task` to run the agent. - -Suppose your training dataset is a `.jsonl` file with samples like: - -```json -{"question": "What is 2 + 2?", "answer": "4"} -{"question": "What is 4 + 4?", "answer": "8"} -``` - -In this case, you can extract the `question` field from `task` to run the agent: - -```python -def workflow_function( - task: Dict, - model: TrinityChatModel, -) -> float: - agent = ReActAgent( - name="react_agent", - sys_prompt="You are a helpful math problem solving agent.", - model=model, - enable_meta_tool=True, - formatter=OpenAIChatFormatter(), - ) - - response = await agent.reply( - msg=Msg("user", task["question"], role="user"), - ) - - # further steps to calculate reward... (See Step 3) -``` - -### Step 3: Implement a reward calculation mechanism - -To train the agent using RL, you need to define a reward calculation mechanism that computes a reward based on the agent's response. - -Continuing from the previous code snippet, suppose you want to give a reward of `1.0` if the agent's answer matches the ground truth answer in `task["answer"]`, and `0.0` otherwise. - -```python -def calculate_reward(answer: str, truth: str) -> float: - """Simple reward: 1.0 for exact match, else 0.0.""" - return 1.0 if answer.strip() == truth.strip() else 0.0 -``` - -To facilitate reward calculation, you can define a structured response model that allows easy parsing of the agent's output. - -```python -from pydantic import BaseModel, Field - -class ResponseStructure(BaseModel): - """Response structure for math tasks (simplified). - This structure let the agent output be easily parsed, - allowing for easy reward calculation. - """ - - result: str = Field(description="Final answer to the math problem.") - -# ... inside workflow_function ... -# response = await agent.reply( -# msg=Msg("user", task["question"], role="user"), -# structured_model=ResponseStructure, # <-- specify structured model here -# ) -# return calculate_reward(response.metadata["result"], task["answer"]) -``` - -### Step 4: Use `tune` to train the workflow function - -Finally, you can use the `tune` interface to train the defined workflow function with a configuration file. - -```python -from agentscope.tune import tune - -# your workflow function here... - -if __name__ == "__main__": - tune( - workflow_func=workflow_function, - config_path="/path/to/your/config.yaml", - ) -``` - -The trained model, training dataset, RL algorithm, training cluster and other configurations are all located in the configuration file, which should follow the Trinity-RFT configuration format. - -See [config.yaml](./config.yaml) for an example configuration. For full configuration details, see [Trinity-RFT Configuration Guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html). - ---- - -### Complete example - -```python -from typing import Dict - -from pydantic import BaseModel, Field - -from agentscope.tune import tune -from agentscope.model import TrinityChatModel -from agentscope.agent import ReActAgent -from agentscope.formatter import OpenAIChatFormatter -from agentscope.message import Msg - - -def calculate_reward(answer: str, truth: str) -> float: - """Simple reward: 1.0 for exact match, else 0.0. - - This is a toy reward function; replace it with a more robust metric if needed. - """ - - return 1.0 if answer.strip() == truth.strip() else 0.0 - - -class ResponseStructure(BaseModel): - """Response structure for math tasks (simplified). - This structure makes the agent's output easy to parse, - allowing for easy reward calculation. - """ - - result: str = Field(description="Final answer to the math problem.") - - -async def react_workflow_function(task: Dict, model: TrinityChatModel) -> float: - """Workflow function for ReAct agent training.""" - - agent = ReActAgent( - name="react_agent", - sys_prompt="You are a helpful math problem solving agent.", - model=model, - enable_meta_tool=True, - formatter=OpenAIChatFormatter(), - ) - - response = await agent.reply( - msg=Msg("user", task["question"], role="user"), - structured_model=ResponseStructure, - ) - - reward = calculate_reward(response.metadata["result"], task["answer"]) - return reward - - -if __name__ == "__main__": - tune( - workflow_func=react_workflow_function, - config_path="/path/to/your/config.yaml", - ) -``` - -> Above code is a simplified example for illustration purposes only. -> For a complete implementation, please refer to [main.py](./main.py). - ---- - -## How to run - -After implementing the workflow function, follow these steps to run the training: - -1. Prerequisites - - - At least 2 NVIDIA GPUs with CUDA 12.8 or newer. - - Adjust the configuration file ([config.yaml](./config.yaml)) based on your hardware. - - Follow the Trinity-RFT [installation guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation.html) to install the latest version from source code. - - Download the GSM8K dataset and Qwen/Qwen3-8B model checkpoints (example): - - ```bash - huggingface-cli download openai/gsm8k --repo-type dataset - huggingface-cli download Qwen/Qwen3-8B - ``` - -2. Set up a [Ray](https://github.com/ray-project/ray) cluster - - ```bash - ray start --head - # for multi-node setup, run the following command on worker nodes - # ray start --address= - ``` - -3. Run the training script - - ```bash - python main.py - ``` diff --git a/examples/training/react_agent/main.py b/examples/training/react_agent/main.py deleted file mode 100644 index 4fc7308b64..0000000000 --- a/examples/training/react_agent/main.py +++ /dev/null @@ -1,98 +0,0 @@ -# -*- coding: utf-8 -*- -"""Example of training a ReAct agent using RL with Trinity-RFT.""" -import os -from typing import Dict - - -from pydantic import BaseModel, Field -from trinity.common.rewards import MathBoxedRewardFn - -from agentscope.tune import tune -from agentscope.model import TrinityChatModel -from agentscope.agent import ReActAgent -from agentscope.formatter import OpenAIChatFormatter -from agentscope.message import Msg - - -class GSM8KResponseStructure(BaseModel): - """Response structure for GSM8K tasks.""" - - result: str = Field( - description=( - "Your solution of the given math problem. " - "Put your final answer in boxed format, e.g., \\boxed{42}" - ), - ) - - -class GSM8KRewardFn(MathBoxedRewardFn): - """Reward function for GSM8K tasks.""" - - def __call__( - self, - response: Dict, - truth: str, - format_score_coef: float = 0.1, - **kwargs: Dict, - ) -> dict[str, float]: - """Calculate the reward based on the response and truth.""" - # parse GSM8K truth - if isinstance(truth, str) and "####" in truth: - truth = truth.split("####")[1].strip() - else: - truth = str(truth) - return super().__call__( - response=response["result"], - truth=truth, - with_think=False, - format_score_coef=format_score_coef, - **kwargs, - ) - - -async def run_react_agent(task: Dict, model: TrinityChatModel) -> float: - """A simple workflow function using the ReAct agent to solve tasks. - - Args: - task (Dict): The task to be solved. - model (TrinityChatModel): The language model to use. - - Returns: - float: The reward obtained by solving the task. - """ - sys_prompt = ( - "You are an agent specialized in solving math problems with tools. " - "Please solve the math problem given to you. You can write and " - "execute Python code to perform calculation or verify your answer. " - "You should return your final answer within \\boxed{{}}." - ) - - response_structure = GSM8KResponseStructure - reward_fn = GSM8KRewardFn() - agent = ReActAgent( - name="react_agent", - sys_prompt=sys_prompt, - model=model, - enable_meta_tool=True, - formatter=OpenAIChatFormatter(), - ) - response = await agent.reply( - msg=Msg("user", task["question"], role="user"), - structured_model=response_structure, - ) - reward = reward_fn( - response=response.metadata, - truth=task["answer"], - ) - return sum(reward.values()) - - -if __name__ == "__main__": - config_path = os.path.join( - os.path.dirname(__file__), - "config.yaml", - ) - tune( - workflow_func=run_react_agent, - config_path=config_path, - ) diff --git a/examples/tuner/react_agent/README.md b/examples/tuner/react_agent/README.md new file mode 100644 index 0000000000..e4d85878a9 --- /dev/null +++ b/examples/tuner/react_agent/README.md @@ -0,0 +1,365 @@ +# AgentScope Tuner Quick Start Guide + +AgentScope provides a `tuner` sub-module to train agent workflows using reinforcement learning (RL). +This guide walks you through the steps to implement and train an agent workflow using RL with AgentScope Tuner. + +## Overview + +To train your agent workflow using RL, you need to understand three components: + +1. **Workflow function**: Refactor your agent workflow into a workflow function that follows the specified input/output signature. +2. **Judge function**: Implement a judge function that computes rewards based on the agent's responses. +3. **Task dataset**: Prepare a dataset containing training samples for the agent to learn. + +The following diagram illustrates the relationship between these components: + +```mermaid +flowchart TD + Model[Model] --> WorkflowFunction[Workflow Function] + WorkflowFunction --> JudgeFunction[Judge Function] + Task[Task] --> WorkflowFunction + Task[Task] --> JudgeFunction + JudgeFunction --> Reward[Reward] + + classDef wfcolor fill:#e67e22,stroke:#333,color:#111; + classDef judgecolor fill:#1abc9c,stroke:#333,color:#111,stroke-dasharray: 5 5; + classDef taskcolor fill:#3498db,stroke:#333,color:#111; + class WorkflowFunction wfcolor; + class JudgeFunction judgecolor; + class Task taskcolor; +``` + +## How to implement + +Here we use a math problem solving scenario as an example to illustrate how to implement the above three components. + +Suppose you have an agent workflow that solves math problems using the `ReActAgent`. + +```python +from agentscope.agent import ReActAgent + +async def run_react_agent(query: str): + # model = ... # Initialize your ChatModel here + + agent = ReActAgent( + name="react_agent", + sys_prompt="You are a helpful math problem solving agent.", + model=model, + enable_meta_tool=True, + formatter=OpenAIChatFormatter(), + ) + + response = await agent.reply( + msg=Msg("user", query, role="user"), + ) + + print(response) +``` + +### Step 1: Prepare task dataset + +To train the agent solving math problems, you need a training dataset that contains samples of math problems and their corresponding ground truth answers. + +The dataset should be organized in huggingface [datasets](https://huggingface.co/docs/datasets/quickstart) format and can be loaded using the `datasets.load_dataset` function. For example: + +``` +my_dataset/ + ├── train.jsonl # samples for training + └── test.jsonl # samples for evaluation +``` + +Suppose your `train.jsonl` contains samples like: + +```json +{"question": "What is 2 + 2?", "answer": "4"} +{"question": "What is 4 + 4?", "answer": "8"} +``` + +Note that the task sample format can vary based on your specific scenario. The key point is that each sample should contain the necessary information for the agent to complete the task and for judging the quality of the response. + +You can preview your dataset using the following code: + +```python +from agentscope.tuner import DatasetConfig + +DatasetConfig(path="my_dataset", split="train").preview() + +# Output: +# [ +# { +# "question": "What is 2 + 2?", +# "answer": "4" +# }, +# { +# "question": "What is 4 + 4?", +# "answer": "8" +# } +# ] +``` + +### Step 2: Define a workflow function + +To train an agent workflow using RL, you need to refactor your agent with the following signature. + +```python +async def workflow_function( + task: Dict, + model: ChatModelBase, + auxiliary_models: Optional[Dict[str, ChatModelBase]]=None, +) -> WorkflowOutput: + """Run the agent workflow on a single task and return a scalar reward.""" +``` + +- Inputs: + - `task`: A dictionary representing a single training task, converted from a sample in the training dataset. For example, if using the dataset prepared in Step 1, the `task` is a dictionary containing `question` and `answer` fields. + - `model`: A `ChatModelBase` instance, which has the same interface as `OpenAIChatModel`, but it supports automatically converting invoke history into trainable data. + - `auxiliary_models`: A dictionary of auxiliary models that can be used in the workflow. The keys are model names, and the values are `ChatModelBase` instances. These models are different from the main `model` in that they are not directly trained, but can be used to assist the main model in completing the task (e.g., acting as Judge). Empty dict if no auxiliary models are needed. + +- Outputs: + - `WorkflowOutput`: An object containing the output of the workflow function, which contains: + - `reward`: A scalar float representing the reward obtained from the workflow function. Fill this field if you want to directly output the reward from the workflow function. Otherwise, you can leave it as `None` and implement the reward calculation in the judge function. + - `response`: The output from the workflow function, which can be the agent's response or other types of outputs depending on your workflow function implementation. Used for reward calculation in the judge function. If you don't need to calculate reward in the judge function, you can leave it as `None`. + - `metrics`: A dictionary of additional metrics that can be logged during training. Leave it as `None` if no additional metrics are needed. + + +Below is a refactored version of the original `run_react_agent` function to fit the workflow function signature. + +**There are only 3 minor changes from the original function**: + +1. use the input `model` to initialize the agent. +2. use the `question` field from the `task` dictionary as the user query. +3. return a `WorkflowOutput` object containing the agent's response. + +```python +from agentscope.agent import ReActAgent +from agentscope.formatter import OpenAIChatFormatter +from agentscope.tuner import WorkflowOutput +from agentscope.message import Msg + +async def run_react_agent( + task: Dict, + model: ChatModelBase, + auxiliary_models: Optional[Dict[str, ChatModelBase]]=None, +) -> WorkflowOutput: + agent = ReActAgent( + name="react_agent", + sys_prompt="You are a helpful math problem solving agent.", + model=model, # directly use the trainable model here + formatter=OpenAIChatFormatter(), + ) + + response = await agent.reply( + msg=Msg("user", task["question"], role="user"), # extract question from task + ) + + return WorkflowOutput( # put the response into WorkflowOutput + response=response, + ) +``` + +### Step 3: Implement the judge function + +To train the agent using RL, you need to define a judge function that computes a reward following the signature below. + +```python +async def judge_function( + task: Dict, + response: Any, + auxiliary_models: Dict[str, ChatModelBase], +) -> JudgeOutput: + """Calculate reward based on the input task and agent's response.""" +``` + +- Inputs: + - `task`: A dictionary representing a single training task, same as the input to the workflow function. + - `response`: The output from the workflow function, which can be the agent's response or other types of outputs depending on your workflow function implementation. + - `auxiliary_models`: A dictionary of auxiliary models that can be used in the reward calculation. The keys are model names, and the values are `ChatModelBase` instances. These models are different from the main model in that they are not directly trained, but can be used to assist in calculating the reward (e.g., acting as Judge). Empty dict if no auxiliary models are needed. + +- Outputs: + - `JudgeOutput`: An object containing the output of the judge function. It contains: + - `reward`: A scalar float representing the reward calculated based on the input task and agent's response. This field must be filled. + - `metrics`: A dictionary of additional metrics that can be logged during training. Leave it as `None` if no additional metrics are needed. + +Here is an example implementation of a simple reward calculation mechanism that gives a reward of `1.0` for an exact match between the agent's answer and the ground truth answer, and `0.0` otherwise. + +> Note: This is a toy reward function; in practice, you need to parse the agent's response to extract the final answer before comparing it with the ground truth. You may also want to use a more robust metric for reward calculation. + +```python +from agentscope.message import Msg +from agentscope.tuner import JudgeOutput + +async def judge_function( + task: Dict, response: Msg, auxiliary_models: Dict[str, ChatModelBase] +) -> JudgeOutput: + """Simple reward: 1.0 for exact match, else 0.0.""" + ground_truth = task["answer"] + reward = 1.0 if ground_truth in response.get_text_content() else 0.0 + return JudgeOutput(reward=reward) +``` + +> Tip: You can leverage existing [`MetricBase`](https://github.com/agentscope-ai/agentscope/blob/main/src/agentscope/evaluate/_metric_base.py) implementations in your judge function to compute more sophisticated metrics and combine them into a composite reward. + +### Step 4: Start tuning + +Finally, you can use the `tune` interface to train the defined workflow function with a configuration file. + +```python +from agentscope.tuner import tune, AlgorithmConfig, DatasetConfig, TunerModelConfig + +# your workflow / judge function here... + +if __name__ == "__main__": + dataset = DatasetConfig(path="my_dataset", split="train") + model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384) + algorithm = AlgorithmConfig( + algorithm_type="multi_step_grpo", + group_size=8, + batch_size=32, + learning_rate=1e-6, + ) + tune( + workflow_func=run_react_agent, + judge_func=judge_function, + model=model, + train_dataset=dataset, + algorithm=algorithm, + ) + # for advanced users, you can pass in config_path to load config from a YAML file + # and ignore other arguments + # tune( + # workflow_func=run_react_agent, + # judge_func=judge_function, + # config_path="config.yaml", + #) +``` + +Here, we use `DatasetConfig` to load the training dataset, `TunerModelConfig` to initialize the trainable model, and `AlgorithmConfig` to specify the RL algorithm and its hyperparameters. + +> Note: +> The `tune` function is based on [Trinity-RFT](https://github.com/modelscope/Trinity-RFT) and it converts the input parameters into a YAML configuration internally. +> Advanced users can ignore `model`, `train_dataset`, `algorithm` arguments and provide a configuration file path pointing to a YAML file using the `config_path` argument instead (see [config.yaml](./config.yaml) for an example). +> We recommend using the configuration file approach for fine-grained control over the training process and leveraging advanced features provided by Trinity-RFT. +> You can refer to the Trinity-RFT [Configuration Guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html) for more details on configuration options. + +The checkpoint and logs will automatically be saved to the `checkpoints/AgentScope` directory under the current working directory and each run will be save in a sub-directory suffixed with current timestamp. +You can found the tensorboard logs inside `monitor/tensorboard` of the checkpoint directory. + +``` +react_agent/ + └── checkpoints/ + └──AgentScope/ + └── Experiment-20260104185355/ # each run saved in a sub-directory with timestamp + ├── monitor/ + │ └── tensorboard/ # tensorboard logs + └── global_step_x/ # saved model checkpoints at step x +``` + +--- + +### Complete example + +```python +from typing import Dict + +from agentscope.tuner import tune, WorkflowOutput, JudgeOutput, DatasetConfig, TunerModelConfig, AlgorithmConfig +from agentscope.agent import ReActAgent +from agentscope.formatter import OpenAIChatFormatter +from agentscope.message import Msg + + +async def run_react_agent( + task: Dict, + model: TunerModelConfig, + auxiliary_models: Dict[str, TunerModelConfig], +) -> WorkflowOutput: + agent = ReActAgent( + name="react_agent", + sys_prompt="You are a helpful math problem solving agent.", + model=model, # directly use the trainable model here + formatter=OpenAIChatFormatter(), + ) + + response = await agent.reply( + msg=Msg("user", task["question"], role="user"), # extract question from task + ) + + return WorkflowOutput( + response=response, + ) + + +async def judge_function( + task: Dict, response: Msg, auxiliary_models: Dict[str, TunerModelConfig] +) -> JudgeOutput: + """Simple reward: 1.0 for exact match, else 0.0.""" + ground_truth = task["answer"] + reward = 1.0 if ground_truth in response.get_text_content() else 0.0 + return JudgeOutput(reward=reward) + + +if __name__ == "__main__": + dataset = DatasetConfig(path="my_dataset", split="train") + model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384) + algorithm = AlgorithmConfig( + algorithm_type="multi_step_grpo", + group_size=8, + batch_size=32, + learning_rate=1e-6, + ) + tune( + workflow_func=run_react_agent, + judge_func=judge_function, + model=model, + train_dataset=dataset, + algorithm=algorithm, + ) +``` + +> Note: +> Above code is a simplified example for illustration purposes only. +> For a complete implementation, please refer to [main.py](./main.py), which trains a ReAct agent to solve math problems on the GSM8K dataset. + +--- + +## How to run + +After implementing the workflow function, follow these steps to run the training: + +1. Prerequisites + + - At least 2 NVIDIA GPUs with CUDA 12.8 or newer. + - Adjust the configuration file ([config.yaml](./config.yaml)) based on your hardware. + - Follow the Trinity-RFT [installation guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_installation.html) to install the latest version from source code. + - Download the GSM8K dataset and Qwen/Qwen3-0.6B model checkpoints (example): + + ```bash + huggingface-cli download openai/gsm8k --repo-type dataset + huggingface-cli download Qwen/Qwen3-0.6B + ``` + +2. Set up a [Ray](https://github.com/ray-project/ray) cluster + + ```bash + ray start --head + # for multi-node setup, run the following command on worker nodes + # ray start --address= + ``` + +3. Run the training script + + ```bash + python main.py + ``` + +4. The reward curve and other training metrics can be monitored using TensorBoard: + + ```bash + tensorboard --logdir ./checkpoints/AgentScope/Experiment-xxxxxx/monitor/tensorboard + ``` + + An example reward curve is shown below: + + ![reward_curve](./reward_curve.png) + +> [!TIP] +> For more tuning examples, refer to [tuner] directory of the AgentScope-Samples repository. diff --git a/examples/training/react_agent/config.yaml b/examples/tuner/react_agent/config.yaml similarity index 71% rename from examples/training/react_agent/config.yaml rename to examples/tuner/react_agent/config.yaml index 9248415f48..9d98203ab0 100644 --- a/examples/training/react_agent/config.yaml +++ b/examples/tuner/react_agent/config.yaml @@ -1,20 +1,20 @@ -project: AgentScope-ReAct -name: GSM8K-Qwen3-8B +# Please refer to https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html for detailed explanation of each field. +project: AgentScope +name: GSM8K-Qwen3-0.6B # directory to save checkpoints, default to ./checkpoints if TRINITY_CHECKPOINT_ROOT_DIR not set checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} algorithm: algorithm_type: multi_step_grpo # a GRPO-based algorithm for multi-step reasoning repeat_times: 8 # repeat each training sample 8 times model: - # path to the pre-trained model, default to Qwen/Qwen3-8B if TRINITY_MODEL_PATH not set - # Note: The model should have ReAct capabilities, e.g., Qwen3 8B or above - # smaller models may not perform well on complex reasoning tasks - model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-8B} + # path to the pre-trained model, default to Qwen/Qwen3-0.6B if TRINITY_MODEL_PATH not set + model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-0.6B} # maximum tokens generated in response max_response_tokens: 16384 # maximum token length for both input and output # if you face OOM, try to reduce max_model_len and max_response_tokens max_model_len: 24576 + temperature: 1.0 cluster: node_num: 1 # cluster with 1 node gpu_per_node: 8 # each node has 8 GPUs @@ -25,41 +25,29 @@ buffer: explorer_input: taskset: # define the taskset for rollout name: gsm8k - storage_type: file path: 'openai/gsm8k' subset_name: 'main' split: 'train' - format: - prompt_key: 'question' - response_key: 'answer' - rollout_args: - temperature: 1.0 explorer: runner_per_model: 16 # each model has 16 runners for parallel rollout max_timeout: 600 # max timeout for each rollout is 600 seconds rollout_model: engine_num: 4 # setup 4 vllm inference model instances tensor_parallel_size: 1 # each model instance uses tensor parallel size of 1 - enable_prefix_caching: false - enforce_eager: true enable_openai_api: true # some parameters to provide openai-style API, don't change them enable_history: true enable_auto_tool_choice: true # Qwen3 series tool_call_parser and reasoning_parser, if you use other models, please adjust accordingly tool_call_parser: hermes reasoning_parser: deepseek_r1 - enable_thinking: true - dtype: bfloat16 - seed: 42 synchronizer: sync_style: dynamic_by_explorer sync_method: 'nccl' - sync_interval: 2 + sync_interval: 1 sync_timeout: 1800 # wait for 30 minutes trainer: save_interval: 100 # save checkpoint every 100 steps use_dynamic_bsz: true - max_token_len_per_gpu: 24576 # if you face OOM, try to reduce this value - ulysses_sequence_parallel_size: 2 # use sequence parallelism to reduce memory usage + ulysses_sequence_parallel_size: 1 # use sequence parallelism to reduce memory usage monitor: - monitor_type: tensorboard # here we use tensorboard, you can also use wandb or mlflow \ No newline at end of file + monitor_type: tensorboard # here we use tensorboard, you can also use wandb, mlflow or swanlab diff --git a/examples/tuner/react_agent/main.py b/examples/tuner/react_agent/main.py new file mode 100644 index 0000000000..5edd5a86a7 --- /dev/null +++ b/examples/tuner/react_agent/main.py @@ -0,0 +1,130 @@ +# -*- coding: utf-8 -*- +"""Example of training a ReAct agent on GSM8K with Trinity-RFT.""" +from typing import Dict + +from agentscope.tuner import ( + tune, + DatasetConfig, + WorkflowOutput, + JudgeOutput, + TunerModelConfig, + AlgorithmConfig, +) +from agentscope.agent import ReActAgent +from agentscope.model import OpenAIChatModel +from agentscope.formatter import OpenAIChatFormatter +from agentscope.message import Msg + + +async def run_react_agent( + task: Dict, + model: OpenAIChatModel, + auxiliary_models: Dict[str, OpenAIChatModel] | None = None, +) -> WorkflowOutput: + """A simple workflow function using the ReAct agent to solve tasks. + + Args: + task (`Dict`): The task to be solved. + model (`OpenAIChatModel`): The language model to use. + auxiliary_models (`Dict[str, OpenAIChatModel]`): + A dictionary of additional chat models available for + LLM-as-a-Judge. Not used in this workflow. + + Returns: + `WorkflowOutput`: The workflow output containing the agent's response. + """ + assert ( + auxiliary_models is None or len(auxiliary_models) == 0 + ), "No auxiliary models are used in this workflow." + + sys_prompt = ( + "You are an agent specialized in solving math problems with tools. " + "Please solve the math problem given to you. You can write and " + "execute Python code to perform calculation or verify your answer. " + "You should return your final answer within \\boxed{{}}." + ) + agent = ReActAgent( + name="react_agent", + sys_prompt=sys_prompt, + model=model, + enable_meta_tool=True, + formatter=OpenAIChatFormatter(), + ) + response = await agent.reply( + msg=Msg("user", task["question"], role="user"), + ) + return WorkflowOutput( + response=response, + ) + + +async def gsm8k_judge( + task: Dict, + response: Msg, + auxiliary_models: Dict[str, OpenAIChatModel] | None = None, +) -> JudgeOutput: + """A simple judge function to calculate reward based on agent's response. + + Args: + task (`Dict`): The task information for the corresponding workflow. + response (`Msg`): The response generated by the corresponding workflow. + auxiliary_models (`Dict[str, OpenAIChatModel]`): + A dictionary of additional chat models available for LLM-as-a-Judge + usage. The keys are model names, and the values are the + corresponding OpenAIChatModel instances. + + Returns: + `JudgeOutput`: The reward value assigned by the judge function. + """ + from trinity.common.rewards.math_reward import MathBoxedRewardFn + + assert ( + auxiliary_models is None or len(auxiliary_models) == 0 + ), "No auxiliary models are used in this workflow." + + reward_fn = MathBoxedRewardFn() + # parse truth from gsm8k raw text + truth = task["answer"] + if isinstance(truth, str) and "####" in truth: + truth = truth.split("####")[1].strip() + else: + truth = str(truth) + # parse answer from response message + result = response.get_text_content() + reward_dict = reward_fn( + response=result, + truth=truth, + ) + return JudgeOutput( + reward=sum(reward_dict.values()), + metrics=reward_dict, + ) + + +if __name__ == "__main__": + dataset = DatasetConfig( + path="openai/gsm8k", + name="main", + split="train", + ) + tuner_model = TunerModelConfig( + model_path="Qwen/Qwen3-0.6B", + max_model_len=24576, + max_tokens=16384, + temperature=1.0, + inference_engine_num=4, + tensor_parallel_size=1, + ) + algorithm = AlgorithmConfig( + algorithm_type="multi_step_grpo", + group_size=8, + learning_rate=1e-6, + batch_size=32, + ) + tune( + workflow_func=run_react_agent, + judge_func=gsm8k_judge, + train_dataset=dataset, + model=tuner_model, + algorithm=algorithm, + ) diff --git a/examples/tuner/react_agent/reward_curve.png b/examples/tuner/react_agent/reward_curve.png new file mode 100644 index 0000000000..2a52a630d9 Binary files /dev/null and b/examples/tuner/react_agent/reward_curve.png differ diff --git a/src/agentscope/model/_trinity_model.py b/src/agentscope/model/_trinity_model.py index 5f8b357d0c..332abd1838 100644 --- a/src/agentscope/model/_trinity_model.py +++ b/src/agentscope/model/_trinity_model.py @@ -4,6 +4,7 @@ Optional, TYPE_CHECKING, ) +from typing_extensions import deprecated from ._openai_model import OpenAIChatModel from ..types import JSONSerializableObject @@ -14,6 +15,9 @@ AsyncOpenAI = "openai.AsyncOpenAI" +@deprecated( + "TrinityChatModel is deprecated. Please use OpenAIChatModel directly.", +) class TrinityChatModel(OpenAIChatModel): """A model class for RL Training with Trinity-RFT.""" diff --git a/src/agentscope/tune/__init__.py b/src/agentscope/tune/__init__.py index facf19b15b..a34cfccdc4 100644 --- a/src/agentscope/tune/__init__.py +++ b/src/agentscope/tune/__init__.py @@ -1,10 +1,7 @@ # -*- coding: utf-8 -*- -"""The learning module of AgentScope, including RL and SFT.""" +"""This module has been deprecated and renamed to 'agentscope.tuner'.""" -from ._tune import tune -from ._workflow import WorkflowType - -__all__ = [ - "tune", - "WorkflowType", -] +raise ImportError( + "The 'agentscope.tune' module has been renamed to 'agentscope.tuner'. " + "Please update your imports: 'from agentscope.tuner import ...'", +) diff --git a/src/agentscope/tune/_tune.py b/src/agentscope/tune/_tune.py deleted file mode 100644 index 0a268e2090..0000000000 --- a/src/agentscope/tune/_tune.py +++ /dev/null @@ -1,72 +0,0 @@ -# -*- coding: utf-8 -*- -"""The main entry point for agent learning.""" -from dataclasses import dataclass -from ._workflow import ( - WorkflowType, - _validate_function_signature, -) - - -def tune(workflow_func: WorkflowType, config_path: str) -> None: - """Train the agent workflow with the specific configuration. - - Args: - workflow_func (WorkflowType): The learning workflow function - to execute. - config_path (str): The configuration for the learning process. - """ - try: - from trinity.cli.launcher import run_stage - from trinity.common.config import Config - from omegaconf import OmegaConf - except ImportError as e: - raise ImportError( - "Trinity-RFT is not installed. Please install it with " - "`pip install trinity-rft`.", - ) from e - - if not _validate_function_signature(workflow_func): - raise ValueError( - "Invalid workflow function signature, please " - "check the types of your workflow input/output.", - ) - - @dataclass - class TuneConfig(Config): - """Configuration for learning process.""" - - def to_trinity_config(self, workflow_func: WorkflowType) -> Config: - """Convert to Trinity-RFT compatible configuration.""" - workflow_name = "agentscope_workflow_adapter" - self.buffer.explorer_input.taskset.default_workflow_type = ( - workflow_name - ) - self.buffer.explorer_input.default_workflow_type = workflow_name - self.buffer.explorer_input.taskset.workflow_args[ - "workflow_func" - ] = workflow_func - return self.check_and_update() - - @classmethod - def load_config(cls, config_path: str) -> "TuneConfig": - """Load the learning configuration from a YAML file. - - Args: - config_path (str): The path to the configuration file. - - Returns: - TuneConfig: The loaded learning configuration. - """ - schema = OmegaConf.structured(cls) - yaml_config = OmegaConf.load(config_path) - try: - config = OmegaConf.merge(schema, yaml_config) - return OmegaConf.to_object(config) - except Exception as e: - raise ValueError(f"Invalid configuration: {e}") from e - - return run_stage( - config=TuneConfig.load_config(config_path).to_trinity_config( - workflow_func, - ), - ) diff --git a/src/agentscope/tune/_workflow.py b/src/agentscope/tune/_workflow.py deleted file mode 100644 index 623e53b6ba..0000000000 --- a/src/agentscope/tune/_workflow.py +++ /dev/null @@ -1,77 +0,0 @@ -# -*- coding: utf-8 -*- -"""Workflow for agent learning.""" - -from typing import ( - Dict, - Callable, - Awaitable, - get_type_hints, -) - -import inspect - -from .._logging import logger -from ..model import TrinityChatModel - - -WorkflowType = Callable[[Dict, TrinityChatModel], Awaitable[float]] - - -def _validate_function_signature(func: Callable) -> bool: - """Validate if a function matches the workflow type signature. - - Args: - func (Callable): The function to validate. - """ - # check if the function is asynchronous - if not inspect.iscoroutinefunction(func): - logger.warning("The function is not asynchronous.") - return False - # Define expected parameter types and return type manually - expected_params = [ - ("task", Dict), - ("model", TrinityChatModel), - ] - expected_return = float - - func_signature = inspect.signature(func) - func_hints = get_type_hints(func) - - # Check if the number of parameters matches - if len(func_signature.parameters) != len(expected_params): - logger.warning( - "Expected %d parameters, but got %d", - len(expected_params), - len(func_signature.parameters), - ) - return False - - # Validate each parameter's name and type - for (param_name, _), (expected_name, expected_type) in zip( - func_signature.parameters.items(), - expected_params, - ): - if ( - param_name != expected_name - or func_hints.get(param_name) != expected_type - ): - logger.warning( - "Expected parameter %s of type %s, but got %s of type %s", - expected_name, - expected_type, - param_name, - func_hints.get(param_name), - ) - return False - - # Validate the return type - return_annotation = func_hints.get("return", None) - if return_annotation != expected_return: - logger.warning( - "Expected return type %s, but got %s", - expected_return, - return_annotation, - ) - return False - - return True diff --git a/src/agentscope/tuner/__init__.py b/src/agentscope/tuner/__init__.py new file mode 100644 index 0000000000..0dd65cb105 --- /dev/null +++ b/src/agentscope/tuner/__init__.py @@ -0,0 +1,24 @@ +# -*- coding: utf-8 -*- +"""The learning module of AgentScope, including RL and SFT.""" + +from ._tune import tune +from ._dataset import DatasetConfig +from ._judge import JudgeType, JudgeOutput +from ._workflow import WorkflowType, WorkflowOutput +from ._algorithm import AlgorithmConfig +from ._model import TunerModelConfig +from ._config import check_judge_function, check_workflow_function + + +__all__ = [ + "tune", + "AlgorithmConfig", + "WorkflowType", + "WorkflowOutput", + "JudgeType", + "JudgeOutput", + "DatasetConfig", + "TunerModelConfig", + "check_workflow_function", + "check_judge_function", +] diff --git a/src/agentscope/tuner/_algorithm.py b/src/agentscope/tuner/_algorithm.py new file mode 100644 index 0000000000..9900115dd4 --- /dev/null +++ b/src/agentscope/tuner/_algorithm.py @@ -0,0 +1,42 @@ +# -*- coding: utf-8 -*- +"""AlgorithmConfig definition for tuner.""" + +from pydantic import BaseModel, Field + + +class AlgorithmConfig(BaseModel): + """Algorithm configuration for tuning.""" + + algorithm_type: str = Field( + description=( + "The tuning algorithm type " + "e.g., 'multi_step_grpo', 'sft'." + "Please refer to https://github.com/modelscope/Trinity-RFT" + "for all supported algorithms. We recommend 'multi_step_grpo'" + "for most agent tuning scenarios." + ), + default="multi_step_grpo", + ) + learning_rate: float = Field( + description="The learning rate for the algorithm.", + default=1e-6, + ) + group_size: int = Field( + description=( + "The group size for algorithms " + "requiring group rollout, e.g., GRPO." + ), + default=8, + ) + batch_size: int = Field( + description="The batch size of each training step.", + default=32, + ) + save_interval_steps: int = Field( + description="The interval steps to save the model.", + default=100, + ) + eval_interval_steps: int = Field( + description="The interval steps to evaluate the model.", + default=100, + ) diff --git a/src/agentscope/tuner/_config.py b/src/agentscope/tuner/_config.py new file mode 100644 index 0000000000..8d2d224db2 --- /dev/null +++ b/src/agentscope/tuner/_config.py @@ -0,0 +1,261 @@ +# -*- coding: utf-8 -*- +"""Configuration conversion for tuner.""" +from typing import Any, Callable, List, Tuple +from datetime import datetime +import inspect + +from ._workflow import WorkflowType +from ._judge import JudgeType +from ._model import TunerModelConfig +from ._dataset import DatasetConfig +from ._algorithm import AlgorithmConfig + + +def _set_if_not_none(obj: Any, field: str, value: Any) -> None: + """Set the field of obj to value if value is not None.""" + if value is not None: + setattr(obj, field, value) + + +def _to_trinity_config( + *, + config_path: str | None = None, + workflow_func: WorkflowType | None = None, + judge_func: JudgeType | None = None, + model: TunerModelConfig | None = None, + auxiliary_models: dict[str, TunerModelConfig] | None = None, + train_dataset: DatasetConfig | None = None, + eval_dataset: DatasetConfig | None = None, + algorithm: AlgorithmConfig | None = None, + project_name: str | None = None, + experiment_name: str | None = None, + monitor_type: str | None = None, +) -> Any: + """Convert to Trinity-RFT compatible configuration.""" + from trinity.common.config import ( + Config, + TasksetConfig, + InferenceModelConfig, + ) + + config, auto_config = _load_config_from_path_or_default(config_path) + assert isinstance(config, Config), "Loaded config is not valid." + + _set_if_not_none(config, "project", project_name) + if experiment_name is None and auto_config: + config.name = "Experiment-" + datetime.now().strftime( + "%Y%m%d%H%M%S", + ) + + _set_if_not_none(config, "monitor", monitor_type) + + workflow_name = "agentscope_workflow_adapter_v1" + if train_dataset is not None: + if config.buffer.explorer_input.taskset is None: + config.buffer.explorer_input.taskset = TasksetConfig( + name="train_taskset", + path=train_dataset.path, + split=train_dataset.split, + subset_name=train_dataset.name, + ) + else: + config.buffer.explorer_input.taskset.path = train_dataset.path + config.buffer.explorer_input.taskset.split = train_dataset.split + config.buffer.explorer_input.taskset.subset_name = ( + train_dataset.name + ) + config.buffer.total_epochs = train_dataset.total_epochs + config.buffer.total_steps = train_dataset.total_steps + config.buffer.explorer_input.taskset.default_workflow_type = workflow_name + config.buffer.explorer_input.default_workflow_type = workflow_name + workflow_args = { + "workflow_func": workflow_func, + } + if judge_func is not None: + workflow_args["judge_func"] = judge_func + + config.buffer.explorer_input.taskset.workflow_args.update(workflow_args) + + if model is not None: + model_config = model.get_config() + config.model.model_path = model_config["model_path"] + config.model.max_model_len = model_config["max_model_len"] + config.model.max_response_tokens = model.max_tokens + config.explorer.rollout_model = InferenceModelConfig( + **model.get_config(), + ) + config.explorer.rollout_model.enable_history = True + if auxiliary_models is not None: + for name, aux_chat_model in auxiliary_models.items(): + model_config = InferenceModelConfig( + **aux_chat_model.get_config(), + ) + model_config.name = name + config.explorer.auxiliary_models.append( + model_config, + ) + if eval_dataset is not None: + config.buffer.explorer_input.eval_tasksets.append( + TasksetConfig( + name="eval_taskset", + path=eval_dataset.path, + split=eval_dataset.split, + subset_name=eval_dataset.name, + ), + ) + for eval_taskset in config.buffer.explorer_input.eval_tasksets: + eval_taskset.workflow_args.update(workflow_args) + if algorithm is not None: + config.algorithm.algorithm_type = algorithm.algorithm_type + config.algorithm.repeat_times = algorithm.group_size + config.algorithm.optimizer.lr = algorithm.learning_rate + config.buffer.batch_size = algorithm.batch_size + config.trainer.save_interval = algorithm.save_interval_steps + config.explorer.eval_interval = algorithm.eval_interval_steps + return config + + +def _load_config_from_path_or_default( + config_path: str | None, +) -> Tuple[Any, bool]: + """Load configuration from the given path or default template. + + Args: + config_path (`str | None`): The path to the configuration file. + Returns: + `Tuple[Any, bool]`: The loaded configuration and a boolean + indicating whether the default template was used. + """ + from trinity.common.config import ( + Config, + load_config, + ) + import tempfile + import yaml + + template_used = False + if config_path is None: + default_config = { + "project": "AgentScope", + "name": "Experiment", + "checkpoint_root_dir": "./checkpoints", + "algorithm": { + "algorithm_type": "multi_step_grpo", + }, + "buffer": { + "total_epochs": 1, + }, + "explorer": { + "runner_per_model": 16, + "max_timeout": 3600, + "max_repeat_times_per_runner": 1, + }, + "synchronizer": { + "sync_style": "dynamic_by_explorer", + "sync_method": "nccl", + "sync_interval": 1, + "sync_timeout": 7200, + }, + "trainer": { + "save_interval": 100, + }, + "monitor": { + "monitor_type": "tensorboard", + }, + } + with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml") as tmp: + yaml.dump(default_config, tmp) + tmp.flush() + config = load_config(tmp.name) + template_used = True + else: + config = load_config(config_path) + + assert isinstance(config, Config), "Loaded config is not valid." + return config, template_used + + +def check_workflow_function( + func: Callable, +) -> None: + """Check if the given function is a valid WorkflowType. + + Args: + func (Callable): The function to check. + """ + essential_params = ["task", "model"] + optional_params = ["auxiliary_models"] + _check_function_signature( + func, + essential_params, + optional_params, + ) + + +def check_judge_function( + func: Callable, +) -> None: + """Check if the given function is a valid JudgeType. + + Args: + func (Callable): The function to check. + """ + essential_params = ["task", "response"] + optional_params = ["auxiliary_models"] + _check_function_signature( + func, + essential_params, + optional_params, + ) + + +def _check_function_signature( + func: Callable, + essential_params: List[str], + optional_params: List[str] | None = None, +) -> None: + """ + Check if the given function has the required signature. + + Args: + func (`Callable`): The function to check. + essential_params (`List[str]`): List of essential parameter names + that must be present in the function. + optional_params (`List[str] | None`): List of optional parameter names + that can be present in the function. + """ + if optional_params is None: + optional_params = [] + + sig = inspect.signature(func) + actual_params = [] + + for param_name, param in sig.parameters.items(): + # *args and **kwargs are not allowed + if param.kind == inspect.Parameter.VAR_POSITIONAL: + raise ValueError(f"*args parameter is not allowed: *{param_name}") + if param.kind == inspect.Parameter.VAR_KEYWORD: + raise ValueError( + f"**kwargs parameter is not allowed: **{param_name}", + ) + actual_params.append(param_name) + + # Convert to sets for easier comparison + actual_params_set = set(actual_params) + essential_params_set = set(essential_params) + optional_params_set = set(optional_params) + allowed_params_set = essential_params_set | optional_params_set + + # Check 1: All essential parameters are present + missing_essential = essential_params_set - actual_params_set + if missing_essential: + raise ValueError( + f"Missing essential parameters: {sorted(missing_essential)}", + ) + + # Check 2: Whether there are disallowed parameters + extra_params = actual_params_set - allowed_params_set + if extra_params: + raise ValueError( + f"Contains disallowed parameters: {sorted(extra_params)}", + ) diff --git a/src/agentscope/tuner/_dataset.py b/src/agentscope/tuner/_dataset.py new file mode 100644 index 0000000000..71352a7233 --- /dev/null +++ b/src/agentscope/tuner/_dataset.py @@ -0,0 +1,61 @@ +# -*- coding: utf-8 -*- +"""DatasetConfig definition for tuner.""" +from itertools import islice +from typing import List +from pydantic import BaseModel, Field + + +class DatasetConfig(BaseModel): + """Dataset configuration for tuning. + Compatible with huggingface dataset format. + Agentscope will load the dataset from the given path using + `datasets.load_dataset`. + """ + + path: str = Field( + description="Path to your dataset.", + ) + name: str | None = Field( + description="The name of the dataset configuration.", + default=None, + ) + split: str | None = Field( + description="The dataset split to use.", + default="train", + ) + total_epochs: int = Field( + description="Total number of epochs to run.", + default=1, + ) + total_steps: int | None = Field( + description=( + "Total number of steps to run. " + "If set, it will override total_epochs." + ), + default=None, + ) + + def preview(self, n: int = 5) -> List: + """Preview the dataset information. + + Args: + n (`int`): Number of samples to preview. Defaults to 5. + """ + try: + from datasets import load_dataset + except ImportError as e: + raise ImportError( + "The `datasets` library is not installed. " + "Please install it with `pip install datasets`.", + ) from e + import json + + ds = load_dataset( + path=self.path, + name=self.name, + split=self.split, + streaming=True, + ) + samples = list(islice(ds, n)) + print(json.dumps(samples, indent=2)) + return samples diff --git a/src/agentscope/tuner/_judge.py b/src/agentscope/tuner/_judge.py new file mode 100644 index 0000000000..487c85dc7d --- /dev/null +++ b/src/agentscope/tuner/_judge.py @@ -0,0 +1,40 @@ +# -*- coding: utf-8 -*- +"""The judge module for tuner.""" +from typing import Any, Callable, Dict, Awaitable +from pydantic import BaseModel, Field +from ..model import ChatModelBase + + +class JudgeOutput(BaseModel): + """The output of a judge function.""" + + reward: float = Field( + description="The reward value assigned by the judge function.", + ) + + metrics: Dict[str, float] | None = Field( + description="Metrics from the judge function.", + default=None, + ) + + +JudgeType = Callable[ + [Dict, Any, Dict[str, ChatModelBase]], + Awaitable[JudgeOutput], +] +# A judge function type for tuning. + +# Args: +# task (`Dict`): +# The task information for the corresponding workflow. +# response (`Any`): +# The response field of the WorkflowOutput generated by the +# corresponding workflow. +# auxiliary_models (`Dict[str, ChatModelBase] | None`): +# A dictionary of additional chat models available for LLM-as-a-Judge +# usage. The keys are model names, and the values are the corresponding +# `ChatModelBase` instances. +# Returns: +# `JudgeOutput`: +# The reward value assigned by the judge function along with optional +# metrics. diff --git a/src/agentscope/tuner/_model.py b/src/agentscope/tuner/_model.py new file mode 100644 index 0000000000..2fff987f17 --- /dev/null +++ b/src/agentscope/tuner/_model.py @@ -0,0 +1,85 @@ +# -*- coding: utf-8 -*- +"""TunerModelConfig definition.""" +from typing import Dict, Any +from pydantic import BaseModel, Field + + +class TunerModelConfig(BaseModel): + """Model configuration for tuning.""" + + model_path: str = Field( + description="The path to the model checkpoint.", + ) + + max_model_len: int = Field( + description=( + "The maximum length of the model, including context" + " and generated tokens." + ), + ) + + temperature: float = Field( + description="Sampling temperature.", + default=1.0, + ) + + top_p: float = Field( + description="Top-p sampling parameter.", + default=1.0, + ) + + max_tokens: int = Field( + description="Maximum tokens for generation.", + default=8192, + ) + + enable_thinking: bool | None = Field( + description=( + "Whether to enable thinking capability. " + "Only applicable for Qwen3 series models." + ), + default=None, + ) + + tensor_parallel_size: int = Field( + description="The tensor parallel size for model inference.", + default=1, + ) + + inference_engine_num: int = Field( + description="The number of engines for model inference.", + default=1, + ) + + tool_call_parser: str = Field( + description=( + "The tool call parser to use. The default setting " + "is for Qwen3 series models." + ), + default="hermes", + ) + + reasoning_parser: str = Field( + description=( + "The reasoning parser to use. The default " + "setting is for Qwen3 series models." + ), + default="deepseek_r1", + ) + + def get_config(self) -> Dict[str, Any]: + """Get the model configuration. + + Returns: + `Dict[str, Any]`: The model configuration dictionary. + """ + return { + "model_path": self.model_path, + "max_model_len": self.max_model_len, + "tensor_parallel_size": self.tensor_parallel_size, + "engine_num": self.inference_engine_num, + "tool_call_parser": self.tool_call_parser, + "reasoning_parser": self.reasoning_parser, + "enable_openai_api": True, + "enable_auto_tool_choice": True, + } diff --git a/src/agentscope/tuner/_tune.py b/src/agentscope/tuner/_tune.py new file mode 100644 index 0000000000..bc50bbd945 --- /dev/null +++ b/src/agentscope/tuner/_tune.py @@ -0,0 +1,96 @@ +# -*- coding: utf-8 -*- +"""The main entry point for agent learning.""" +import os +from ._workflow import WorkflowType +from ._judge import JudgeType +from ._model import TunerModelConfig +from ._dataset import DatasetConfig +from ._config import ( + _to_trinity_config, + check_judge_function, + check_workflow_function, +) +from ._algorithm import AlgorithmConfig + + +def tune( + *, + workflow_func: WorkflowType, + judge_func: JudgeType | None = None, + train_dataset: DatasetConfig | None = None, + eval_dataset: DatasetConfig | None = None, + model: TunerModelConfig | None = None, + auxiliary_models: dict[str, TunerModelConfig] | None = None, + algorithm: AlgorithmConfig | None = None, + project_name: str | None = None, + experiment_name: str | None = None, + monitor_type: str | None = None, + config_path: str | None = None, +) -> None: + """Train the agent workflow with the specific configuration. + + Args: + workflow_func (`WorkflowType`): The learning workflow function + to execute. + judge_func (`JudgeType`, optional): The judge function used to + evaluate the workflow output. Defaults to None. + train_dataset (`DatasetConfig`, optional): The training dataset for + the learning process. Defaults to None. + eval_dataset (`DatasetConfig`, optional): The evaluation dataset for + the learning process. Defaults to None. + model (`TunerModelConfig`, optional): The model to be tuned. + Defaults to None. + auxiliary_models (`dict[str, TunerModelConfig]`, optional): A + dictionary of auxiliary models for LLM-as-a-Judge + or acting other agents in multi-agent scenarios. + Defaults to None. + algorithm (`AlgorithmConfig`, optional): The tuning algorithm + configuration. Defaults to None. + project_name (`str`, optional): Name of the project. + Defaults to None. + experiment_name (`str`, optional): Name of the experiment. + Leave None to use timestamp. Defaults to None. + monitor_type (`str`, optional): Type of the monitor to use. + Could be one of 'tensorboard', 'wandb', 'mlflow', 'swanlab'. + Leave None to use tensorboard. Defaults to None. + config_path (`str`, optional): Path to a trinity yaml configuration + file. If provided, only `workflow_func` is necessary, other + arguments will override the corresponding fields in the config. + Defaults to None. + """ + try: + from trinity.cli.launcher import run_stage + from trinity.utils.dlc_utils import setup_ray_cluster, stop_ray_cluster + except ImportError as e: + raise ImportError( + "Trinity-RFT is not installed. Please install it with " + "`pip install trinity-rft`.", + ) from e + + check_workflow_function(workflow_func) + if judge_func is not None: + check_judge_function(judge_func) + + config = _to_trinity_config( + config_path=config_path, + workflow_func=workflow_func, + judge_func=judge_func, + model=model, + auxiliary_models=auxiliary_models, + train_dataset=train_dataset, + eval_dataset=eval_dataset, + algorithm=algorithm, + project_name=project_name, + experiment_name=experiment_name, + monitor_type=monitor_type, + ) + use_dlc = os.environ.get("USE_ALIYUN_PAI_DLC", "0") == "1" + if use_dlc: + config.cluster.ray_address = setup_ray_cluster(namespace="agentscope") + try: + return run_stage( + config=config.check_and_update(), + ) + finally: + if use_dlc: + stop_ray_cluster(namespace="agentscope") diff --git a/src/agentscope/tuner/_workflow.py b/src/agentscope/tuner/_workflow.py new file mode 100644 index 0000000000..814cf502bf --- /dev/null +++ b/src/agentscope/tuner/_workflow.py @@ -0,0 +1,53 @@ +# -*- coding: utf-8 -*- +"""The workflow module for tuner.""" +from typing import Any, Callable, Dict, Awaitable +from pydantic import BaseModel, Field +from ..model import ChatModelBase + + +class WorkflowOutput(BaseModel): + """The output of a workflow function.""" + + reward: float | None = Field( + description=( + "The reward obtained from the workflow function. " + "Used for direct reward output." + ), + default=None, + ) + response: Any | None = Field( + description=( + "The response generated by the workflow function. " + "Used as judge input." + ), + default=None, + ) + + metrics: Dict[str, float] | None = Field( + description="Metrics from the workflow function.", + default=None, + ) + + +WorkflowType = Callable[ + [Dict, ChatModelBase, Dict[str, ChatModelBase]], + Awaitable[WorkflowOutput], +] +# An agent workflow function type for tuning. + +# Args: +# task (`Dict`): +# The task information for the workflow run. +# model (`ChatModelBase`): +# The primary chat model used in the workflow, this is the main model +# being tuned. +# auxiliary_models (`Dict[str, ChatModelBase] | None`): +# A dictionary of additional chat models available for LLM-as-a-Judge +# usage. The keys are model names, and the values are the corresponding +# `ChatModelBase` instances. Note that these auxiliary models are not +# tuned during the workflow. + +# Returns: +# `WorkflowOutput`: +# The workflow execution results, including optional reward, raw +# response and metrics. diff --git a/tests/model_trinity_test.py b/tests/model_trinity_test.py deleted file mode 100644 index ea32c9f372..0000000000 --- a/tests/model_trinity_test.py +++ /dev/null @@ -1,95 +0,0 @@ -# -*- coding: utf-8 -*- -# pylint: disable=too-many-statements -"""Unit tests for Trinity-RFT model class.""" -from unittest.async_case import IsolatedAsyncioTestCase -from unittest.mock import Mock, AsyncMock - -from agentscope.model import TrinityChatModel, ChatResponse -from agentscope.message import TextBlock - - -class TestTrinityModel(IsolatedAsyncioTestCase): - """Test cases for TrinityModel.""" - - async def test_init_with_trinity_client(self) -> None: - """Test initialization with a valid OpenAI async client.""" - MODEL_NAME = "Qwen/Qwen3-8B" - mock_client = Mock() - mock_client.model_path = MODEL_NAME - - # test init - model_1 = TrinityChatModel( - openai_async_client=mock_client, - enable_thinking=False, - generate_kwargs={ - "temperature": 1.0, - "top_k": 2, - }, - ) - model_2 = TrinityChatModel( - openai_async_client=mock_client, - enable_thinking=True, - generate_kwargs={ - "max_tokens": 500, - "top_p": 0.9, - }, - ) - self.assertEqual(model_1.model_name, MODEL_NAME) - self.assertFalse(model_1.stream) - self.assertIs(model_1.client, mock_client) - self.assertEqual(model_2.model_name, MODEL_NAME) - self.assertFalse(model_2.stream) - self.assertIs(model_2.client, mock_client) - - # create mock response - messages = [{"role": "user", "content": "Hello"}] - mock_message = Mock() - mock_message.content = "Hi there!" - mock_message.reasoning_content = None - mock_message.tool_calls = [] - mock_message.audio = None - mock_message.parsed = None - mock_choice = Mock() - mock_choice.message = mock_message - mock_response = Mock() - mock_response.choices = [mock_choice] - mock_usage = Mock() - mock_usage.prompt_tokens = 10 - mock_usage.completion_tokens = 20 - mock_response.usage = mock_usage - - mock_client.chat.completions.create = AsyncMock( - return_value=mock_response, - ) - - result = await model_1(messages) - call_args = mock_client.chat.completions.create.call_args[1] - self.assertEqual(call_args["model"], MODEL_NAME) - self.assertEqual(call_args["messages"], messages) - self.assertFalse(call_args["stream"]) - self.assertFalse(call_args["chat_template_kwargs"]["enable_thinking"]) - self.assertEqual(call_args["temperature"], 1.0) - self.assertEqual(call_args["top_k"], 2) - self.assertFalse("max_tokens" in call_args) - self.assertFalse("top_p" in call_args) - self.assertIsInstance(result, ChatResponse) - expected_content = [ - TextBlock(type="text", text="Hi there!"), - ] - self.assertEqual(result.content, expected_content) - - result = await model_2(messages) - call_args = mock_client.chat.completions.create.call_args[1] - self.assertEqual(call_args["model"], MODEL_NAME) - self.assertEqual(call_args["messages"], messages) - self.assertFalse(call_args["stream"]) - self.assertTrue(call_args["chat_template_kwargs"]["enable_thinking"]) - self.assertEqual(call_args["max_tokens"], 500) - self.assertEqual(call_args["top_p"], 0.9) - self.assertFalse("temperature" in call_args) - self.assertFalse("top_k" in call_args) - self.assertIsInstance(result, ChatResponse) - expected_content = [ - TextBlock(type="text", text="Hi there!"), - ] - self.assertEqual(result.content, expected_content) diff --git a/tests/tune_test.py b/tests/tune_test.py deleted file mode 100644 index 4e690acca8..0000000000 --- a/tests/tune_test.py +++ /dev/null @@ -1,67 +0,0 @@ -# -*- coding: utf-8 -*- -# pylint: disable=unused-argument -"""Learn related tests in agentscope.""" -from typing import Any, Dict, List -from unittest.async_case import IsolatedAsyncioTestCase - -from agentscope.model import TrinityChatModel, OpenAIChatModel -from agentscope.tune._workflow import _validate_function_signature - - -async def correct_interface(task: Dict, model: TrinityChatModel) -> float: - """Correct interface matching the workflow type.""" - return task["reward"] - - -async def wrong_interface_1( - task: Dict, - model: TrinityChatModel, - extra: Any, -) -> float: - """Wrong interface with extra argument.""" - return 0.0 - - -async def wrong_interface_2(task: Dict) -> float: - """Wrong interface with missing argument.""" - return 0.0 - - -async def wrong_interface_3(task: List, model: TrinityChatModel) -> float: - """Wrong interface with wrong task type.""" - return 0.0 - - -async def wrong_interface_4(task: Dict, model: OpenAIChatModel) -> float: - """Wrong interface with wrong model type.""" - return 0.0 - - -async def wrong_interface_5(task: Dict, model: TrinityChatModel) -> str: - """Wrong interface with wrong return type.""" - return "0.0" - - -class AgentLearnTest(IsolatedAsyncioTestCase): - """Test the learning functionality of agents.""" - - async def test_workflow_interface_validate(self) -> None: - """Test the interface of workflow function.""" - self.assertTrue( - _validate_function_signature(correct_interface), - ) - self.assertFalse( - _validate_function_signature(wrong_interface_1), - ) - self.assertFalse( - _validate_function_signature(wrong_interface_2), - ) - self.assertFalse( - _validate_function_signature(wrong_interface_3), - ) - self.assertFalse( - _validate_function_signature(wrong_interface_4), - ) - self.assertFalse( - _validate_function_signature(wrong_interface_5), - ) diff --git a/tests/tuner_test.py b/tests/tuner_test.py new file mode 100644 index 0000000000..6b9f7357a6 --- /dev/null +++ b/tests/tuner_test.py @@ -0,0 +1,153 @@ +# -*- coding: utf-8 -*- +# pylint: disable=unused-argument +# pylint: disable=too-many-statements +"""Unit tests for tuner related modules.""" +from unittest.async_case import IsolatedAsyncioTestCase +from typing import Dict, Any + +from agentscope.tuner import TunerModelConfig, WorkflowOutput, JudgeOutput +from agentscope.tuner._config import ( + check_judge_function, + check_workflow_function, +) + + +async def correct_workflow_func( + task: Dict, + model: TunerModelConfig, + auxiliary_models: Dict[str, TunerModelConfig], +) -> WorkflowOutput: + """Correct interface matching the workflow type.""" + return WorkflowOutput( + response="Test response", + ) + + +async def correct_workflow_func_no_aux( + task: Dict, + model: TunerModelConfig, +) -> WorkflowOutput: + """Correct interface matching the workflow type without + auxiliary models.""" + return WorkflowOutput( + response="Test response", + ) + + +async def incorrect_workflow_func_1(task: Dict) -> WorkflowOutput: + """Incorrect interface not matching the workflow type.""" + return WorkflowOutput( + response="Test response", + ) + + +async def incorrect_workflow_func_2( + task: Dict, + model: TunerModelConfig, + aux_model: int, +) -> WorkflowOutput: + """Incorrect interface not matching the workflow type.""" + return WorkflowOutput( + response="Test response", + ) + + +async def correct_judge_func( + task: Dict, + response: Any, + auxiliary_models: Dict[str, TunerModelConfig], +) -> JudgeOutput: + """Correct interface matching the judge type.""" + return JudgeOutput( + reward=1.0, + ) + + +async def incorrect_judge_func_1( + wrong_name: Dict, + response: Any, +) -> JudgeOutput: + """Incorrect interface not matching the judge type.""" + return JudgeOutput( + reward=1.0, + ) + + +async def incorrect_judge_func_2( + response: Any, +) -> JudgeOutput: + """Incorrect interface not matching the judge type.""" + return JudgeOutput( + reward=1.0, + ) + + +class TestTunerFunctionType(IsolatedAsyncioTestCase): + """Test cases for tuner function type validation.""" + + def test_validate_workflow_type(self) -> None: + """Test workflow type validation.""" + # Correct cases + check_workflow_function(correct_workflow_func) + check_workflow_function(correct_workflow_func_no_aux) + + # Incorrect cases + with self.assertRaises(ValueError): + check_workflow_function(incorrect_workflow_func_1) + with self.assertRaises(ValueError): + check_workflow_function(incorrect_workflow_func_2) + + # Correct cases + check_judge_function(correct_judge_func) + + # Incorrect cases + with self.assertRaises(ValueError): + check_judge_function(incorrect_judge_func_1) + with self.assertRaises(ValueError): + check_judge_function(incorrect_judge_func_2) + + +class TestDataset(IsolatedAsyncioTestCase): + """Test cases for DatasetConfig.""" + + async def test_preview(self) -> None: + """Test preview method.""" + try: + import datasets + except ImportError: + datasets = None + self.skipTest("datasets library is not installed.") + from agentscope.tuner import DatasetConfig + from pathlib import Path + import tempfile + + assert datasets is not None + + with tempfile.TemporaryDirectory() as tmpdirname: + # generate a small dataset directory + dataset_dir = Path(tmpdirname) / "my_dataset" + dataset_dir.mkdir(parents=True, exist_ok=True) + sample_file = dataset_dir / "train.jsonl" + sample_content = [ + '{"question": "What is 2 + 2?", "answer": "4"}', + '{"question": "What is 4 + 4?", "answer": "8"}', + '{"question": "What is 8 + 8?", "answer": "16"}', + ] + with open(sample_file, "w", encoding="utf-8") as f: + for line in sample_content: + f.write(line + "\n") + + dataset = DatasetConfig(path=str(dataset_dir), split="train") + samples = dataset.preview(n=2) + self.assertEqual(len(samples), 2) + samples = dataset.preview(n=5) + self.assertEqual(len(samples), 3) + with self.assertRaises(OSError): + invalid_ds = DatasetConfig(path="/invalid/path", split="train") + invalid_ds.preview() + with self.assertRaises(ValueError): + invalid_ds = DatasetConfig( + path=str(dataset_dir), + split="invalid_split", + ) + invalid_ds.preview()