Skip to content
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
bebbd42
add function type check
pan-x-c Dec 26, 2025
ed5a85a
add agentscope tuner v1 interface
pan-x-c Dec 26, 2025
81d8d62
update tuner interface
pan-x-c Dec 26, 2025
f451bcc
add missing file
pan-x-c Dec 26, 2025
fea6b96
fix pre-commit
pan-x-c Dec 26, 2025
bebf343
finish agentscope tune v1 interface
pan-x-c Dec 29, 2025
59b8deb
fix readme
pan-x-c Dec 29, 2025
657f7ec
rename to tuner
pan-x-c Dec 29, 2025
a8c6198
fix example
pan-x-c Dec 29, 2025
435cac4
update readme
pan-x-c Dec 30, 2025
8a1e03b
fix readme
pan-x-c Dec 30, 2025
3326838
fix eval tasksets
pan-x-c Dec 30, 2025
1a4526b
use tuner model
pan-x-c Dec 30, 2025
ceca3b3
fix pre-commit
pan-x-c Dec 30, 2025
789de86
update readme
pan-x-c Dec 30, 2025
68417bc
fix model type
pan-x-c Dec 30, 2025
f22eef0
refactor structure
pan-x-c Jan 4, 2026
8321195
fix doc
pan-x-c Jan 4, 2026
ea0f7c5
update comments
pan-x-c Jan 5, 2026
65b404f
fix comments
pan-x-c Jan 5, 2026
a338b38
add function type check
pan-x-c Jan 5, 2026
e2396a1
add unittests
pan-x-c Jan 5, 2026
55e7e4f
fix pre-commit
pan-x-c Jan 5, 2026
f2e7322
fix missing eval workflow args
pan-x-c Jan 5, 2026
11231b9
fix workflow args
pan-x-c Jan 5, 2026
7837814
fix reponse signature
pan-x-c Jan 5, 2026
423f9dc
fix comments
pan-x-c Jan 5, 2026
d20b08d
auto setup cluster on dlc
pan-x-c Jan 6, 2026
baaf0a7
fix dlc setup
pan-x-c Jan 6, 2026
b36805c
add tuner tutorial
pan-x-c Jan 8, 2026
68bec87
add tuner in tutorial
pan-x-c Jan 8, 2026
a631d53
add chinese doc
pan-x-c Jan 8, 2026
cf426cd
fix docs
pan-x-c Jan 8, 2026
907924a
clean code
pan-x-c Jan 8, 2026
4e8c373
add reward curve
pan-x-c Jan 8, 2026
6d0963c
fix comments
pan-x-c Jan 8, 2026
f9ceb44
fix missing packages
pan-x-c Jan 9, 2026
ed40269
move doc from training to tuner
pan-x-c Jan 9, 2026
631ec36
add tips for metrics
pan-x-c Jan 9, 2026
db37537
add tips
pan-x-c Jan 9, 2026
58643e6
fix chinese doc
pan-x-c Jan 9, 2026
9ebc8ed
fix en doc
pan-x-c Jan 9, 2026
4251087
add links to samples
pan-x-c Jan 9, 2026
ec83c38
fix comments
pan-x-c Jan 9, 2026
64fb8b5
fix comments
pan-x-c Jan 9, 2026
1f0429f
fix algorithm doc
pan-x-c Jan 9, 2026
3163b94
fix comments
pan-x-c Jan 9, 2026
99fe36d
fix type doc
pan-x-c Jan 9, 2026
5970355
fix comments
pan-x-c Jan 9, 2026
b274cd8
fix eval workflow_args
pan-x-c Jan 12, 2026
f2e27ed
Merge branch 'main' into feature/tuner_enhance
pan-x-c Jan 12, 2026
03e7733
fix dependencies
pan-x-c Jan 13, 2026
ef1461a
fix comments
pan-x-c Jan 13, 2026
6ff7f85
rename modules
pan-x-c Jan 13, 2026
4b2d9e9
fix pre-commit
pan-x-c Jan 13, 2026
3652692
remove template yaml
pan-x-c Jan 13, 2026
1a5d036
fix conflict
pan-x-c Jan 13, 2026
a873ace
fix config
pan-x-c Jan 13, 2026
ad23a5d
fix pre-commit
pan-x-c Jan 13, 2026
0dc0f01
fix pre-commit
pan-x-c Jan 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
311 changes: 208 additions & 103 deletions examples/training/react_agent/README.md

Large diffs are not rendered by default.

30 changes: 9 additions & 21 deletions examples/training/react_agent/config.yaml
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
project: AgentScope-ReAct
name: GSM8K-Qwen3-8B
# Please refer to https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html for detailed explanation of each field.
project: AgentScope
name: GSM8K-Qwen3-0.6B
# directory to save checkpoints, default to ./checkpoints if TRINITY_CHECKPOINT_ROOT_DIR not set
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
algorithm:
algorithm_type: multi_step_grpo # a GRPO-based algorithm for multi-step reasoning
repeat_times: 8 # repeat each training sample 8 times
model:
# path to the pre-trained model, default to Qwen/Qwen3-8B if TRINITY_MODEL_PATH not set
# Note: The model should have ReAct capabilities, e.g., Qwen3 8B or above
# smaller models may not perform well on complex reasoning tasks
model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-8B}
# path to the pre-trained model, default to Qwen/Qwen3-0.6B if TRINITY_MODEL_PATH not set
model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-0.6B}
# maximum tokens generated in response
max_response_tokens: 16384
# maximum token length for both input and output
# if you face OOM, try to reduce max_model_len and max_response_tokens
max_model_len: 24576
temperature: 1.0
cluster:
node_num: 1 # cluster with 1 node
gpu_per_node: 8 # each node has 8 GPUs
Expand All @@ -25,41 +25,29 @@ buffer:
explorer_input:
taskset: # define the taskset for rollout
name: gsm8k
storage_type: file
path: 'openai/gsm8k'
subset_name: 'main'
split: 'train'
format:
prompt_key: 'question'
response_key: 'answer'
rollout_args:
temperature: 1.0
explorer:
runner_per_model: 16 # each model has 16 runners for parallel rollout
max_timeout: 600 # max timeout for each rollout is 600 seconds
rollout_model:
engine_num: 4 # setup 4 vllm inference model instances
tensor_parallel_size: 1 # each model instance uses tensor parallel size of 1
enable_prefix_caching: false
enforce_eager: true
enable_openai_api: true # some parameters to provide openai-style API, don't change them
enable_history: true
enable_auto_tool_choice: true
# Qwen3 series tool_call_parser and reasoning_parser, if you use other models, please adjust accordingly
tool_call_parser: hermes
reasoning_parser: deepseek_r1
enable_thinking: true
dtype: bfloat16
seed: 42
synchronizer:
sync_style: dynamic_by_explorer
sync_method: 'nccl'
sync_interval: 2
sync_interval: 1
sync_timeout: 1800 # wait for 30 minutes
trainer:
save_interval: 100 # save checkpoint every 100 steps
use_dynamic_bsz: true
max_token_len_per_gpu: 24576 # if you face OOM, try to reduce this value
ulysses_sequence_parallel_size: 2 # use sequence parallelism to reduce memory usage
ulysses_sequence_parallel_size: 1 # use sequence parallelism to reduce memory usage
monitor:
monitor_type: tensorboard # here we use tensorboard, you can also use wandb or mlflow
monitor_type: tensorboard # here we use tensorboard, you can also use wandb, mlflow or swanlab
147 changes: 89 additions & 58 deletions examples/training/react_agent/main.py
Original file line number Diff line number Diff line change
@@ -1,74 +1,47 @@
# -*- coding: utf-8 -*-
"""Example of training a ReAct agent using RL with Trinity-RFT."""
import os
"""Example of training a ReAct agent on GSM8K with Trinity-RFT."""
from typing import Dict


from pydantic import BaseModel, Field
from trinity.common.rewards import MathBoxedRewardFn

from agentscope.tune import tune
from agentscope.model import TrinityChatModel
from agentscope.tuner import (
tune,
Dataset,
WorkflowOutput,
JudgeOutput,
TunerChatModel,
Algorithm,
)
from agentscope.agent import ReActAgent
from agentscope.formatter import OpenAIChatFormatter
from agentscope.message import Msg


class GSM8KResponseStructure(BaseModel):
"""Response structure for GSM8K tasks."""

result: str = Field(
description=(
"Your solution of the given math problem. "
"Put your final answer in boxed format, e.g., \\boxed{42}"
),
)


class GSM8KRewardFn(MathBoxedRewardFn):
"""Reward function for GSM8K tasks."""

def __call__(
self,
response: Dict,
truth: str,
format_score_coef: float = 0.1,
**kwargs: Dict,
) -> dict[str, float]:
"""Calculate the reward based on the response and truth."""
# parse GSM8K truth
if isinstance(truth, str) and "####" in truth:
truth = truth.split("####")[1].strip()
else:
truth = str(truth)
return super().__call__(
response=response["result"],
truth=truth,
with_think=False,
format_score_coef=format_score_coef,
**kwargs,
)


async def run_react_agent(task: Dict, model: TrinityChatModel) -> float:
async def run_react_agent(
task: Dict,
model: TunerChatModel,
auxiliary_models: Dict[str, TunerChatModel],
) -> WorkflowOutput:
"""A simple workflow function using the ReAct agent to solve tasks.

Args:
task (Dict): The task to be solved.
model (TrinityChatModel): The language model to use.
model (TunerChatModel): The language model to use.
auxiliary_models (Dict[str, TunerChatModel]):
A dictionary of additional chat models available for
LLM-as-a-Judge. Not used in this workflow.

Returns:
float: The reward obtained by solving the task.
"""
assert (
len(auxiliary_models) == 0
), "No auxiliary models are used in this workflow."

sys_prompt = (
"You are an agent specialized in solving math problems with tools. "
"Please solve the math problem given to you. You can write and "
"execute Python code to perform calculation or verify your answer. "
"You should return your final answer within \\boxed{{}}."
)

response_structure = GSM8KResponseStructure
reward_fn = GSM8KRewardFn()
agent = ReActAgent(
name="react_agent",
sys_prompt=sys_prompt,
Expand All @@ -78,21 +51,79 @@ async def run_react_agent(task: Dict, model: TrinityChatModel) -> float:
)
response = await agent.reply(
msg=Msg("user", task["question"], role="user"),
structured_model=response_structure,
)
reward = reward_fn(
response=response.metadata,
truth=task["answer"],
return WorkflowOutput(
response=response,
)


async def gsm8k_judge(
task: Dict,
response: Msg,
auxiliary_models: Dict[str, TunerChatModel],
) -> JudgeOutput:
"""A simple judge function to calculate reward based on agent's response.

Args:
task (Dict): The task information for the corresponding workflow.
response (Msg): The response generated by the corresponding workflow.
auxiliary_models (Dict[str, TunerChatModel]):
A dictionary of additional chat models available for LLM-as-a-Judge
usage. The keys are model names, and the values are the
corresponding TunerChatModel instances.

Returns:
JudgeOutput: The reward value assigned by the judge function.
"""
from trinity.common.rewards.math_reward import MathBoxedRewardFn

assert (
len(auxiliary_models) == 0
), "No auxiliary models are used in this workflow."

reward_fn = MathBoxedRewardFn()
# parse truth from gsm8k raw text
truth = task["answer"]
if isinstance(truth, str) and "####" in truth:
truth = truth.split("####")[1].strip()
else:
truth = str(truth)
# parse answer from response message
result = response.get_text_content()
reward_dict = reward_fn(
response=result,
truth=truth,
)
return JudgeOutput(
reward=sum(reward_dict.values()),
metrics=reward_dict,
)
return sum(reward.values())


if __name__ == "__main__":
config_path = os.path.join(
os.path.dirname(__file__),
"config.yaml",
dataset = Dataset(
path="openai/gsm8k",
name="main",
split="train",
)
tuner_model = TunerChatModel(
model_path="Qwen/Qwen3-0.6B",
max_model_len=24576,
max_tokens=16384,
temperature=1.0,
inference_engine_num=4,
tensor_parallel_size=1,
)
algorithm = Algorithm(
algorithm_type="multi_step_grpo",
group_size=8,
learning_rate=1e-6,
batch_size=32,
)
tune(
workflow_func=run_react_agent,
config_path=config_path,
judge_func=gsm8k_judge,
train_dataset=dataset,
model=tuner_model,
algorithm=algorithm,
)
5 changes: 5 additions & 0 deletions src/agentscope/model/_trinity_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
Optional,
TYPE_CHECKING,
)
from typing_extensions import deprecated
from ._openai_model import OpenAIChatModel
from ..types import JSONSerializableObject

Expand All @@ -14,6 +15,10 @@
AsyncOpenAI = "openai.AsyncOpenAI"


@deprecated(
"TrinityChatModel is deprecated, please use "
"`agentscope.tuner.TunerChatModel` instead.",
)
class TrinityChatModel(OpenAIChatModel):
"""A model class for RL Training with Trinity-RFT."""

Expand Down
13 changes: 5 additions & 8 deletions src/agentscope/tune/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
# -*- coding: utf-8 -*-
"""The learning module of AgentScope, including RL and SFT."""
"""This module has been deprecated and renamed to 'agentscope.tuner'."""

from ._tune import tune
from ._workflow import WorkflowType

__all__ = [
"tune",
"WorkflowType",
]
raise ImportError(
"The 'agentscope.tune' module has been renamed to 'agentscope.tuner'. "
"Please update your imports: 'from agentscope.tuner import ...'",
)
72 changes: 0 additions & 72 deletions src/agentscope/tune/_tune.py

This file was deleted.

Loading