-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat(tuner): Enhance Agent Tune Interface #1079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
bebbd42
add function type check
pan-x-c ed5a85a
add agentscope tuner v1 interface
pan-x-c 81d8d62
update tuner interface
pan-x-c f451bcc
add missing file
pan-x-c fea6b96
fix pre-commit
pan-x-c bebf343
finish agentscope tune v1 interface
pan-x-c 59b8deb
fix readme
pan-x-c 657f7ec
rename to tuner
pan-x-c a8c6198
fix example
pan-x-c 435cac4
update readme
pan-x-c 8a1e03b
fix readme
pan-x-c 3326838
fix eval tasksets
pan-x-c 1a4526b
use tuner model
pan-x-c ceca3b3
fix pre-commit
pan-x-c 789de86
update readme
pan-x-c 68417bc
fix model type
pan-x-c f22eef0
refactor structure
pan-x-c 8321195
fix doc
pan-x-c ea0f7c5
update comments
pan-x-c 65b404f
fix comments
pan-x-c a338b38
add function type check
pan-x-c e2396a1
add unittests
pan-x-c 55e7e4f
fix pre-commit
pan-x-c f2e7322
fix missing eval workflow args
pan-x-c 11231b9
fix workflow args
pan-x-c 7837814
fix reponse signature
pan-x-c 423f9dc
fix comments
pan-x-c d20b08d
auto setup cluster on dlc
pan-x-c baaf0a7
fix dlc setup
pan-x-c b36805c
add tuner tutorial
pan-x-c 68bec87
add tuner in tutorial
pan-x-c a631d53
add chinese doc
pan-x-c cf426cd
fix docs
pan-x-c 907924a
clean code
pan-x-c 4e8c373
add reward curve
pan-x-c 6d0963c
fix comments
pan-x-c f9ceb44
fix missing packages
pan-x-c ed40269
move doc from training to tuner
pan-x-c 631ec36
add tips for metrics
pan-x-c db37537
add tips
pan-x-c 58643e6
fix chinese doc
pan-x-c 9ebc8ed
fix en doc
pan-x-c 4251087
add links to samples
pan-x-c ec83c38
fix comments
pan-x-c 64fb8b5
fix comments
pan-x-c 1f0429f
fix algorithm doc
pan-x-c 3163b94
fix comments
pan-x-c 99fe36d
fix type doc
pan-x-c 5970355
fix comments
pan-x-c b274cd8
fix eval workflow_args
pan-x-c f2e27ed
Merge branch 'main' into feature/tuner_enhance
pan-x-c 03e7733
fix dependencies
pan-x-c ef1461a
fix comments
pan-x-c 6ff7f85
rename modules
pan-x-c 4b2d9e9
fix pre-commit
pan-x-c 3652692
remove template yaml
pan-x-c 1a5d036
fix conflict
pan-x-c a873ace
fix config
pan-x-c ad23a5d
fix pre-commit
pan-x-c 0dc0f01
fix pre-commit
pan-x-c File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,247 @@ | ||
| # -*- coding: utf-8 -*- | ||
| """ | ||
| .. _tuner: | ||
|
|
||
| Tuner | ||
| ================= | ||
|
|
||
| AgentScope provides the ``tuner`` module for training agent applications using reinforcement learning (RL). | ||
| This tutorial will guide you through how to leverage the ``tuner`` module to improve agent performance on specific tasks, including: | ||
|
|
||
| - Introducing the core components of the ``tuner`` module | ||
| - Demonstrating the key code required for the tuning workflow | ||
| - Showing how to configure and run the tuning process | ||
|
|
||
| Main Components | ||
| ~~~~~~~~~~~~~~~~~~~ | ||
| The ``tuner`` module introduces three core components essential for RL-based agent training: | ||
|
|
||
| - **Task Dataset**: A collection of tasks for training and evaluating the agent. | ||
| - **Workflow Function**: Encapsulates the agent's logic to be tuned. | ||
| - **Judge Function**: Evaluates the agent's performance on tasks and provides reward signals for tuning. | ||
|
|
||
| In addition, ``tuner`` provides several configuration classes for customizing the tuning process, including: | ||
|
|
||
| - **TunerModelConfig**: Model configurations for tuning purposes. | ||
| - **AlgorithmConfig**: Specifies the RL algorithm (e.g., GRPO, PPO) and its parameters. | ||
|
|
||
| Implementation | ||
| ~~~~~~~~~~~~~~~~~~~ | ||
| This section demonstrates how to use ``tuner`` to train a simple math agent. | ||
|
|
||
| Task Dataset | ||
| -------------------- | ||
| The task dataset contains tasks for training and evaluating your agent. | ||
|
|
||
| You dataset should follow the Huggingface `datasets <https://huggingface.co/docs/datasets/quickstart>`_ format, which can be loaded with ``datasets.load_dataset``. For example: | ||
|
|
||
| .. code-block:: text | ||
|
|
||
| my_dataset/ | ||
| ├── train.jsonl # training samples | ||
| └── test.jsonl # evaluation samples | ||
|
|
||
| Suppose your `train.jsonl` contains: | ||
|
|
||
| .. code-block:: json | ||
|
|
||
| {"question": "What is 2 + 2?", "answer": "4"} | ||
| {"question": "What is 4 + 4?", "answer": "8"} | ||
|
|
||
| Before starting tuning, you can verify that your dataset is loaded correctly with: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from agentscope.tuner import DatasetConfig | ||
|
|
||
| dataset = DatasetConfig(path="my_dataset", split="train") | ||
| dataset.preview(n=2) | ||
| # Output the first two samples to verify correct loading | ||
| # [ | ||
| # { | ||
| # "question": "What is 2 + 2?", | ||
| # "answer": "4" | ||
| # }, | ||
| # { | ||
| # "question": "What is 4 + 4?", | ||
| # "answer": "8" | ||
| # } | ||
| # ] | ||
|
|
||
| Workflow Function | ||
| -------------------- | ||
| The workflow function defines how the agent interacts with the environment and makes decisions. All workflow functions should follow the input/output signature defined in ``agentscope.tuner.WorkflowType``. | ||
|
|
||
| Below is an example workflow function using a ReAct agent to answer math questions: | ||
| """ | ||
|
|
||
| from typing import Dict, Optional | ||
| from agentscope.agent import ReActAgent | ||
| from agentscope.formatter import OpenAIChatFormatter | ||
| from agentscope.message import Msg | ||
| from agentscope.model import ChatModelBase | ||
| from agentscope.tuner import WorkflowOutput | ||
|
|
||
|
|
||
| async def example_workflow_function( | ||
| task: Dict, | ||
| model: ChatModelBase, | ||
| auxiliary_models: Optional[Dict[str, ChatModelBase]] = None, | ||
| ) -> WorkflowOutput: | ||
| """An example workflow function for tuning. | ||
|
|
||
| Args: | ||
| task (`Dict`): The task information. | ||
| model (`ChatModelBase`): The chat model used by the agent. | ||
| auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional | ||
| chat models, generally used to simulate the behavior of other | ||
| non-training agents in multi-agent scenarios. | ||
|
|
||
| Returns: | ||
| `WorkflowOutput`: The output generated by the workflow. | ||
| """ | ||
| agent = ReActAgent( | ||
| name="react_agent", | ||
| sys_prompt="You are a helpful math problem solving agent.", | ||
| model=model, | ||
| formatter=OpenAIChatFormatter(), | ||
| ) | ||
|
|
||
| response = await agent.reply( | ||
| msg=Msg( | ||
| "user", | ||
| task["question"], | ||
| role="user", | ||
| ), # extract question from task | ||
| ) | ||
|
|
||
| return WorkflowOutput( # return the response | ||
| response=response, | ||
| ) | ||
|
|
||
|
|
||
| # %% | ||
| # You can directly run this workflow function with a task dictionary and a ``DashScopeChatModel`` / ``OpenAIChatModel`` to test its correctness before formal training. For example: | ||
|
|
||
| import asyncio | ||
| import os | ||
| from agentscope.model import DashScopeChatModel | ||
|
|
||
| task = {"question": "What is 123 plus 456?", "answer": "579"} | ||
| model = DashScopeChatModel( | ||
| model_name="qwen-max", | ||
| api_key=os.environ["DASHSCOPE_API_KEY"], | ||
| ) | ||
| workflow_output = asyncio.run(example_workflow_function(task, model)) | ||
| assert isinstance( | ||
| workflow_output.response, | ||
| Msg, | ||
| ), "In this example, the response should be a Msg instance." | ||
| print("\nWorkflow response:", workflow_output.response.get_text_content()) | ||
|
|
||
| # %% | ||
| # | ||
| # Judge Function | ||
| # -------------------- | ||
| # The judge function evaluates the agent's performance on a given task and provides a reward signal for tuning. | ||
| # All judge functions should follow the input/output signature defined in ``agentscope.tuner.JudgeType``. | ||
| # Below is a simple judge function that compares the agent's response with the ground truth answer: | ||
|
|
||
| from typing import Any | ||
| from agentscope.tuner import JudgeOutput | ||
|
|
||
|
|
||
| async def example_judge_function( | ||
| task: Dict, | ||
| response: Any, | ||
| auxiliary_models: Optional[Dict[str, ChatModelBase]] = None, | ||
| ) -> JudgeOutput: | ||
| """A very simple judge function only for demonstration. | ||
|
|
||
| Args: | ||
| task (`Dict`): The task information. | ||
| response (`Any`): The response field from the WorkflowOutput. | ||
| auxiliary_models (`Optional[Dict[str, ChatModelBase]]`): Additional | ||
| chat models for LLM-as-a-Judge purpose. | ||
| Returns: | ||
| `JudgeOutput`: The reward assigned by the judge. | ||
| """ | ||
| ground_truth = task["answer"] | ||
| reward = 1.0 if ground_truth in response.get_text_content() else 0.0 | ||
| return JudgeOutput(reward=reward) | ||
|
|
||
|
|
||
| judge_output = asyncio.run( | ||
| example_judge_function( | ||
| task, | ||
| workflow_output.response, | ||
| ), | ||
| ) | ||
| print(f"Judge reward: {judge_output.reward}") | ||
|
|
||
| # %% | ||
| # The judge function can also be locally tested in the same way as shown above before formal training to ensure its logic is correct. | ||
| # | ||
| # .. tip:: | ||
| # You can leverage existing `MetricBase <https://github.com/agentscope-ai/agentscope/blob/main/src/agentscope/evaluate/_metric_base.py>`_ implementations in your judge function to compute more sophisticated metrics and combine them into a composite reward. | ||
| # | ||
| # Configuration and Running | ||
| # ~~~~~~~~~~~~~~~ | ||
| # Finally, you can configure and run the tuning process using the ``tuner`` module. | ||
| # Before starting, ensure that `Trinity-RFT <https://github.com/modelscope/Trinity-RFT>`_ is installed in your environment, as it is required for tuning. | ||
| # | ||
| # Below is an example of configuring and starting the tuning process: | ||
| # | ||
| # .. note:: | ||
| # This example is for demonstration only. For a complete runnable example, see `Tune ReActAgent <https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/react_agent>`_ | ||
| # | ||
| # .. code-block:: python | ||
| # | ||
| # from agentscope.tuner import tune, AlgorithmConfig, DatasetConfig, TunerModelConfig | ||
| # # your workflow / judge function here... | ||
| # | ||
| # if __name__ == "__main__": | ||
| # dataset = DatasetConfig(path="my_dataset", split="train") | ||
| # model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384) | ||
| # algorithm = AlgorithmConfig( | ||
| # algorithm_type="multi_step_grpo", | ||
| # group_size=8, | ||
| # batch_size=32, | ||
| # learning_rate=1e-6, | ||
| # ) | ||
| # tune( | ||
| # workflow_func=example_workflow_function, | ||
| # judge_func=example_judge_function, | ||
| # model=model, | ||
| # train_dataset=dataset, | ||
| # algorithm=algorithm, | ||
| # ) | ||
| # | ||
| # Here, ``DatasetConfig`` configures the training dataset, ``TunerModelConfig`` sets the parameters for the trainable model, and ``AlgorithmConfig`` specifies the reinforcement learning algorithm and its hyperparameters. | ||
| # | ||
| # .. tip:: | ||
| # The ``tune`` function is based on `Trinity-RFT <https://github.com/modelscope/Trinity-RFT>`_ and internally converts input parameters to a YAML configuration. | ||
| # Advanced users can skip the ``model``, ``train_dataset``, and ``algorithm`` arguments and instead provide a YAML config file path via the ``config_path`` argument. | ||
| # Using a configuration file is recommended for fine-grained control and to leverage advanced Trinity-RFT features. See the Trinity-RFT `Configuration Guide <https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html>`_ for more options. | ||
| # | ||
| # Save the above code as ``main.py`` and run it with: | ||
| # | ||
| # .. code-block:: bash | ||
| # | ||
| # ray start --head | ||
| # python main.py | ||
| # | ||
| # Checkpoints and logs are automatically saved to the ``checkpoints/AgentScope`` directory under your workspace, with each run in a timestamped sub-directory. Tensorboard logs can be found in ``monitor/tensorboard`` within the checkpoint directory. | ||
| # | ||
| # .. code-block:: text | ||
| # | ||
| # your_workspace/ | ||
| # └── checkpoints/ | ||
| # └──AgentScope/ | ||
| # └── Experiment-20260104185355/ # each run saved in a sub-directory with timestamp | ||
| # ├── monitor/ | ||
| # │ └── tensorboard/ # tensorboard logs | ||
| # └── global_step_x/ # saved model checkpoints at step x | ||
| # | ||
| # .. tip:: | ||
| # For more tuning examples, refer to the `tuner directory <https://github.com/agentscope-ai/agentscope-samples/tree/main/tuner>`_ of the AgentScope-Samples repository. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.