Explorer API collects feedback from agent applications #295

pan-x-c · 2025-09-22T12:24:02Z

Description

As the title says

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist · 2025-09-22T12:24:19Z

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust mechanism for collecting feedback from agent applications interacting with the Explorer API. It establishes a session-based system where experiences generated during a session can be linked and later attributed with a reward, enabling more sophisticated reinforcement learning feedback loops. The changes span the API endpoints, the core service logic for managing experiences, and the client-side integration for agents.

Highlights

Feedback Collection API Endpoints: Introduced new API endpoints in trinity/explorer/api/api.py: /allocate for generating a unique session ID and /feedback for receiving reward feedback associated with a specific session ID. The existing chat_completions endpoint was updated to accept an optional session_id.
Session-based Experience Management: The ExplorerService in trinity/explorer/api/service.py was enhanced to manage experiences on a per-session basis. It now includes a session_level_experience_queue to temporarily store experiences linked to a session until feedback is received. A new allocate_session method generates session IDs, and record_feedback applies rewards to session-specific experiences before moving them to the general experience queue.
API Client Updates: The ExplorerClient in trinity/explorer/explorer_client.py was refactored to better handle API URLs and integrate with the new feedback mechanism. It now initializes a session upon creation and provides methods (feedback, feedback_async) to send reward feedback to the Explorer API, associating it with the client's session ID.
Concurrency and Type Hinting Improvements: Added an asyncio.Lock (queue_lock) to ExplorerService to ensure thread-safe access to experience queues. Several asynchronous methods in ExplorerService were updated with explicit -> None return type hints for clarity.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a mechanism for collecting feedback from agent applications through new API endpoints. It adds session management to associate experiences with feedback. The overall implementation is sound, but I've identified a critical bug in an API call and a high-severity issue related to the lack of input validation, which could lead to server errors. My review includes suggestions to fix these issues.

trinity/explorer/api/api.py

pan-x-c · 2025-09-22T12:33:56Z

/unittest-all

github-actions · 2025-09-22T13:22:32Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
142	141	0	1	0	0	2.8s

Skipped

Tests	Status
tests/trainer/trainer_test.py::TestTrainerMultiModal::test_trainer	skipped ⏭️

Tests

Test Name	Status	Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	11ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer	✅	3ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft	✅	5ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo	✅	5ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	2ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter	✅	1ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	7ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	3ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	4ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	1ms
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_buffer_read_write	✅	3ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0	✅	1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1	✅	2ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2	✅	1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3	✅	2ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4	✅	1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5	✅	3ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command	✅	7ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc	✅	1ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command	✅	1ms
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run	✅	1ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	12ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	55ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	35ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	45ms
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	21ms
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	20ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	21ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	20ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	59ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	47ms
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	204ms
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	66ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	3ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	22ms
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	14ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	5ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable	✅	1ms
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	29ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	75ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	72ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	116ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	132ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	✅	70ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	✅	70ms
tests/service/data_juicer_test.py::TestDataJuicer::test_config	✅	1ms
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	✅	22ms
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators	✅	21ms
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline	✅	14ms
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	✅	153ms
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	✅	317ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	58ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	52ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	55ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	✅	57ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	✅	61ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	102ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	39ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	36ms
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	36ms
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	✅	81ms
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	✅	77ms
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	✅	171ms
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	56ms
tests/trainer/trainer_test.py::TestTrainerMultiModal::test_trainer	⏭️	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	1ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	1ms
tests/utils/log_test.py::LogTest::test_actor_log	✅	2ms
tests/utils/log_test.py::LogTest::test_group_by_node	✅	2ms
tests/utils/log_test.py::LogTest::test_no_actor_log	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote	✅	6ms
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class	✅	3ms

Github Test Reporter by CTRF 💚

pan-x-c · 2025-10-15T10:05:10Z

/unittest-all

pan-x-c · 2025-10-21T07:05:02Z

/unittest-all

add allocate_session and feedback API

31c5693

gemini-code-assist bot reviewed Sep 22, 2025

View reviewed changes

trinity/explorer/api/api.py Outdated Show resolved Hide resolved

trinity/explorer/api/api.py Show resolved Hide resolved

fix comments

ece144e

pan-x-c added 5 commits September 23, 2025 10:58

add api test

7ed876c

merge main

bc3d72e

add tests

20857a7

process group for serve model

5295688

fix tests

fa6604a

pan-x-c changed the title ~~[WIP] Explorer API collects feedback from agent applications~~ Explorer API collects feedback from agent applications Oct 15, 2025

Merge branch 'main' into feature/user_feedback

1154605

pan-x-c added 2 commits October 15, 2025 19:43

clean code

ef7278e

Merge branch 'main' into feature/user_feedback

f401491

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explorer API collects feedback from agent applications #295

Explorer API collects feedback from agent applications #295

Uh oh!

pan-x-c commented Sep 22, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Sep 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

pan-x-c commented Oct 15, 2025

Uh oh!

pan-x-c commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Explorer API collects feedback from agent applications #295

Are you sure you want to change the base?

Explorer API collects feedback from agent applications #295

Uh oh!

Conversation

pan-x-c commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot commented Sep 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Summary

Skipped

Tests

Uh oh!

pan-x-c commented Oct 15, 2025

Uh oh!

pan-x-c commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pan-x-c commented Sep 22, 2025 •

edited

Loading