Refactor WorkflowRuner and Scheduler to handling partially completed tasks by pan-x-c · Pull Request #530 · agentscope-ai/Trinity-RFT

pan-x-c · 2026-04-23T08:35:34Z

Description

This pull request introduces support for handling and returning partially completed tasks in the scheduler during over-rollout scenarios. It adds a new configuration option to control this behavior, updates the scheduler logic to track partial completions and emit partial results, and extends the test suite to verify correct handling of partial successes. The changes also refactor the Status class to track the number of completed and total runs, improving the granularity of task completion status.

Partial Task Handling and Over-Rollout Improvements:

Added a new configuration option return_partial_tasks in OverRolloutConfig to control whether tasks with partial successful runs are returned during over-rollout cleanup. (trinity/common/config.py)
Modified the scheduler to accumulate run results per task, emit partial results when enabled, and track completed versus total runs for each task. This includes new methods for accumulating, building, and emitting task results, as well as collecting and emitting partial tasks for a batch. (trinity/explorer/scheduler.py) [1] [2] [3] [4] [5] [6] [7] [8]
Updated the explorer to pass the return_partial_tasks flag to the scheduler when retrieving results for both exploration and evaluation steps. (trinity/explorer/explorer.py) [1] [2]

Status Tracking and Refactoring:

Refactored the Status class to track completed_runs and total_runs instead of a simple ok boolean, with a property to determine overall success. Added a new RunnerExecutionResult dataclass for clarity. (trinity/explorer/workflow_runner.py)
Updated all relevant code paths to use the new Status structure, ensuring accurate reporting of partial completions and error messages. (trinity/explorer/scheduler.py) [1] [2] [3]

Testing Enhancements:

Added new tests to verify correct handling and reporting of partial task completions in both the scheduler and workflow runner, including custom dummy workflows to simulate partial failures. (tests/explorer/scheduler_test.py, tests/explorer/workflow_test.py) [1] [2] [3] [4] [5]

These changes provide better observability and control over task execution in distributed or unreliable environments, ensuring that partially successful work is not lost and is properly reported.

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

pan-x-c · 2026-04-23T09:16:19Z

/unittest-diff

github-actions · 2026-04-23T09:49:07Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
116	113	1	2	0	0	29m 53s

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	The test failed in the call phase due to an assertion error

Skipped

Tests	Status
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content	skipped ⏭️
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async	skipped ⏭️

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	23.3s
tests/common/config_test.py::TestConfig::test_chat_template_path	✅	97ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	35ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	209ms
tests/common/config_test.py::TestConfig::test_default_workflow	✅	96ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	1.4s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	✅	97ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation	✅	96ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster	✅	1.6s
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_build_experience_token_view_aligns_prompt_action_mask_and_logprobs	✅	1ms
tests/common/experience_test.py::TestExperience::test_deserialize_legacy_pickle_payload	✅	2ms
tests/common/experience_test.py::TestExperience::test_deserialize_single_rejects_batch_payload	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_action_mask	✅	1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_decoded_token_text	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	14ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_print_colored_tokens_writes_to_file	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_deserialize_many	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_with_shared_multimodal_tensor	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/external_model_test.py::TestExternalModel::test_external_model	✅	50.8s
tests/common/external_model_test.py::TestExternalModelLoad::test_external_model_load	✅	2.3s
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_first_message_is_assistant	✅	597ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_messages_empty	✅	274ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_no_assistant_messages	✅	588ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_normal_conversation_data	✅	275ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution	✅	2ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes	✅	1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled	✅	1ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation	✅	1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution	✅	1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled	✅	1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation	✅	1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	1m 12s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	42.4s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	55.1s
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	41.4s
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	27.6s
tests/common/vllm_test.py::TestModelLen_2::test_model_len	✅	27.3s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len	✅	27.3s
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation	✅	27.7s
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status	✅	27.8s
tests/common/vllm_test.py::TestAPIServer::test_api	✅	28.3s
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content	⏭️	515ms
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api	✅	24.1s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	26.0s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async	⏭️	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	307ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	576ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	2m 26s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	1m 58s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate	✅	1m 54s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api	✅	46.0s
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	1m 45s
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer	✅	1m 15s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer	✅	1m 25s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	3m 6s
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	1m 3s
tests/explorer/proxy_test.py::RecorderTest::test_recorder	✅	61ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	5.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout	✅	12.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	30.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0	✅	4.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1	✅	4.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0	✅	5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	❌	4.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	6.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	5.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait	✅	13.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks	✅	9.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	15.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	9.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	8.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	25.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	7.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13.6s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection	✅	10.6s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	602ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	27ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	17ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	138ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	7ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	13ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	8ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	101ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	201ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	✅	22.1s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	✅	22.4s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording	✅	4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1	✅	3.3s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	142ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state	✅	8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_0_sequential	✅	38ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_1_asynchronous	✅	39ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_2_multi_threading	✅	77ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai	✅	24.3s
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner	✅	44.9s

Github Test Reporter by CTRF 💚

Copilot

Pull request overview

Adds optional support for returning partially completed tasks during scheduler over-rollout cleanup, with more granular run-level success reporting via an updated Status model.

Changes:

Introduces OverRolloutConfig.return_partial_tasks and validates it in config validation.
Refactors runner/scheduler result handling to track completed_runs vs total_runs and optionally emit partial task results during cleanup.
Extends tests to cover partial-success behavior and updates an agentscope dependency constraint.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`trinity/explorer/workflow_runner.py`	Refactors execution to aggregate per-run outcomes into `Status(completed_runs/total_runs)` and supports partial collection/fail-fast modes.
`trinity/explorer/scheduler.py`	Tracks per-task aggregated results across subtasks and can emit partially completed tasks during cleanup.
`trinity/explorer/explorer.py`	Threads `return_partial_tasks` flag through scheduler result collection for explore/eval steps.
`trinity/common/config_validator.py`	Extends over-rollout validation to cover `return_partial_tasks` and updates an error message.
`trinity/common/config.py`	Adds `return_partial_tasks` option to `OverRolloutConfig`.
`tests/trainer/trainer_test.py`	Enables `return_partial_tasks` in an over-rollout trainer integration test configuration.
`tests/explorer/workflow_test.py`	Adds runner tests for partial success + fail-fast behavior across concurrent modes; adjusts agentscope adapter test.
`tests/explorer/scheduler_test.py`	Adds scheduler test verifying partial task emission during over-rollout cleanup.
`pyproject.toml`	Bumps `agentscope[tuner]` minimum version to `>=1.0.19`.

Comments suppressed due to low confidence (1)

trinity/explorer/scheduler.py:189

run_with_retry’s docstring wasn’t updated for the new collect_partial_runs parameter, so it’s unclear how this flag affects runner execution and retries. Please document the parameter and its semantics (e.g., whether it controls fail-fast vs collecting partial successes across repeats).

        """
        Args:
            task (`TaskWrapper`): The task to run.
            repeat_times (`int`): The number of times to repeat the task.
            run_id_base (`int`): The base run id for this task runs.
            timeout (`float`): The timeout for each task run.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pan-x-c · 2026-04-23T11:06:30Z

/unittest-diff

github-actions · 2026-04-23T11:40:44Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
119	116	1	2	0	0	31m 29s

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks	The test failed in the call phase due to an assertion error

Skipped

Tests	Status
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content	skipped ⏭️
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async	skipped ⏭️

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	23.3s
tests/common/config_test.py::TestConfig::test_chat_template_path	✅	97ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	36ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	208ms
tests/common/config_test.py::TestConfig::test_default_workflow	✅	95ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	1.4s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	✅	96ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation	✅	96ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster	✅	1.6s
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_build_experience_token_view_aligns_prompt_action_mask_and_logprobs	✅	1ms
tests/common/experience_test.py::TestExperience::test_deserialize_legacy_pickle_payload	✅	2ms
tests/common/experience_test.py::TestExperience::test_deserialize_single_rejects_batch_payload	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_action_mask	✅	1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_decoded_token_text	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	14ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_print_colored_tokens_writes_to_file	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_deserialize_many	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_with_shared_multimodal_tensor	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/external_model_test.py::TestExternalModel::test_external_model	✅	55.3s
tests/common/external_model_test.py::TestExternalModelLoad::test_external_model_load	✅	2.1s
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_first_message_is_assistant	✅	579ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_messages_empty	✅	265ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_no_assistant_messages	✅	542ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_normal_conversation_data	✅	256ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution	✅	2ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes	✅	1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled	✅	1ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation	✅	1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution	✅	1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled	✅	1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation	✅	1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	1m 7s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	42.1s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	38.6s
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	42.7s
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	27.1s
tests/common/vllm_test.py::TestModelLen_2::test_model_len	✅	33.1s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len	✅	27.1s
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation	✅	26.7s
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status	✅	26.7s
tests/common/vllm_test.py::TestAPIServer::test_api	✅	25.1s
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content	⏭️	515ms
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api	✅	26.3s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	25.2s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async	⏭️	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	289ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	577ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	2m 28s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	2m 27s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate	✅	1m 53s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api	✅	46.1s
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	1m 47s
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer	✅	1m 14s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer	✅	2m 36s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	3m 8s
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	1m 3s
tests/explorer/proxy_test.py::RecorderTest::test_recorder	✅	80ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	5.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout	✅	13.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	30.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0	✅	5.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1	✅	5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0	✅	5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	✅	5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	6.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait	✅	14.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks	❌	10.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	15.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	9.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	8.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	25.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	9.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	14.9s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection	✅	10.5s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	604ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	29ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	20ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	142ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	5ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	12ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	8ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	102ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	203ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	✅	23.6s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	✅	22.4s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording	✅	4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1	✅	3.4s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	145ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_0_sequential	✅	71ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_1_asynchronous	✅	60ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_2_multi_threading	✅	540ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state	✅	8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_0_sequential	✅	40ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_1_asynchronous	✅	39ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_2_multi_threading	✅	41ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai	✅	23.9s
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner	✅	44.1s

Github Test Reporter by CTRF 💚

pan-x-c · 2026-04-23T11:56:29Z

/unittest-pattern-test_over_rollout_return_partial_tasks

github-actions · 2026-04-23T11:59:40Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
1	1	0	0	0	0	29.7s

Tests

Test Name	Status	Flaky	Duration
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks	✅		16.1s

Github Test Reporter by CTRF 💚

pan-x-c · 2026-04-23T12:27:40Z

/unittest-module-trainer

github-actions · 2026-04-23T13:22:06Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
27	24	0	3	0	0	51m 41s

Skipped

Tests	Status
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class	skipped ⏭️

Tests

Test Name	Status	Duration
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	✅	4m 23s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	✅	4m 50s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	1m 51s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	1m 18s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	1m 2s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	✅	1m 6s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	✅	1m 13s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	⏭️	1ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	38.3s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	34.0s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	34.5s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	✅	1m 47s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	✅	1m 43s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	✅	2m 34s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	✅	2m 53s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	✅	5m 53s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	1m 58s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer	✅	1m 49s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	✅	4m 38s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	✅	1m 47s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	✅	3m 36s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer	✅	1m 7s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	✅	48.2s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	⏭️	1ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class	⏭️	1ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner	✅	1m 24s
tests/trainer/trainer_test.py::ColocateModeTest::test_trainer	✅	2m 3s

Github Test Reporter by CTRF 💚

pan-x-c · 2026-04-23T13:27:48Z

/unittest-diff

github-actions · 2026-04-23T14:02:37Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
123	121	0	2	0	0	32m 5s

Skipped

Tests	Status
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content	skipped ⏭️
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async	skipped ⏭️

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	23.2s
tests/common/config_test.py::TestConfig::test_chat_template_path	✅	96ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	35ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	208ms
tests/common/config_test.py::TestConfig::test_default_workflow	✅	96ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	1.4s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	✅	98ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation	✅	95ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster	✅	1.6s
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_build_experience_token_view_aligns_prompt_action_mask_and_logprobs	✅	1ms
tests/common/experience_test.py::TestExperience::test_deserialize_legacy_pickle_payload	✅	3ms
tests/common/experience_test.py::TestExperience::test_deserialize_single_rejects_batch_payload	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_action_mask	✅	1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_decoded_token_text	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	13ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_print_colored_tokens_writes_to_file	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_deserialize_many	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_with_shared_multimodal_tensor	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/external_model_test.py::TestExternalModel::test_external_model	✅	50.4s
tests/common/external_model_test.py::TestExternalModelLoad::test_external_model_load	✅	2.1s
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_first_message_is_assistant	✅	573ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_messages_empty	✅	273ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_no_assistant_messages	✅	564ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_normal_conversation_data	✅	260ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution	✅	1ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes	✅	1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled	✅	1ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation	✅	1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation	✅	1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution	✅	1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled	✅	1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation	✅	1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	1m 13s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	42.3s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	39.0s
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	27.6s
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	43.0s
tests/common/vllm_test.py::TestModelLen_2::test_model_len	✅	33.0s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len	✅	27.0s
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation	✅	27.4s
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status	✅	27.7s
tests/common/vllm_test.py::TestAPIServer::test_api	✅	29.0s
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content	⏭️	516ms
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api	✅	23.3s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	25.7s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async	⏭️	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	314ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	588ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	2m 28s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	1m 59s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate	✅	2m 21s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api	✅	46.1s
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	1m 48s
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer	✅	1m 18s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer	✅	2m 56s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	3m 6s
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	1m 3s
tests/explorer/proxy_test.py::RecorderTest::test_recorder	✅	65ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	5.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout	✅	13.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	29.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0	✅	5.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1	✅	5.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0	✅	5.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	✅	4.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	5.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_async_cancelled_runner_accepts_next_batch	✅	5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait	✅	9.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks	✅	5.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_sync_cancel_does_not_imply_immediate_runner_reuse	✅	7.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	14.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	9.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	8.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	25.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_timeout_cleanup_still_restarts_runner	✅	6.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_unexpected_task_exception_restarts_runner	✅	4.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	8.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13.8s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection	✅	10.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	603ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	27ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	17ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	136ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	4ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	11ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	8ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	101ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	201ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	✅	22.4s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	✅	22.2s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording	✅	4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1	✅	3.4s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	143ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_0_sequential	✅	68ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_1_asynchronous	✅	62ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_2_multi_threading	✅	540ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state	✅	8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_0_sequential	✅	39ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_1_asynchronous	✅	38ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_2_multi_threading	✅	40ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai	✅	24.2s
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner	✅	45.4s

Github Test Reporter by CTRF 💚

pan-x-c · 2026-04-24T03:09:34Z

/unittest-module-explorer

github-actions · 2026-04-24T03:29:40Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
59	59	0	0	0	0	17m 18s

Tests

Test Name	Status	Duration
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	2m 26s
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer	✅	1m 21s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer	✅	3m 2s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	3m 29s
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	1m 3s
tests/explorer/proxy_test.py::RecorderTest::test_recorder	✅	59ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	6.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout	✅	13.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	30.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0	✅	5.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1	✅	5.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0	✅	5.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	✅	5.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	5.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_async_cancelled_runner_accepts_next_batch	✅	6.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait	✅	9.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks	✅	6.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_sync_cancel_does_not_imply_immediate_runner_reuse	✅	7.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	15.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	9.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	8.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	25.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_timeout_cleanup_still_restarts_runner	✅	6.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_unexpected_task_exception_restarts_runner	✅	4.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	8.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13.6s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection	✅	10.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	602ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	28ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	18ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	136ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	3ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	11ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	8ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	102ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	202ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	✅	22.7s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	✅	22.7s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording	✅	4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1	✅	3.4s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	144ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_0_sequential	✅	69ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_1_asynchronous	✅	59ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_2_multi_threading	✅	540ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state	✅	8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_0_sequential	✅	40ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_1_asynchronous	✅	39ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_2_multi_threading	✅	42ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai	✅	24.5s
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner	✅	45.5s

Github Test Reporter by CTRF 💚

pan-x-c added 6 commits April 23, 2026 16:31

refactor workflowruner and scheduler

3088dfb

fix pre-commit

0208d40

optimize sequential

cd2e4c3

fix agentscope tests

8262870

fix pre-commit

5085c68

validate config

bfa536b

pan-x-c added 3 commits April 23, 2026 18:10

update tests

f246a7c

fail fast when return_partial_tasks is False

3ee4418

fix pre-commit

e716d24

pan-x-c requested a review from Copilot April 23, 2026 10:23

Copilot started reviewing on behalf of pan-x-c April 23, 2026 10:24 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread trinity/explorer/workflow_runner.py

Comment thread trinity/explorer/workflow_runner.py Outdated

Comment thread tests/explorer/workflow_test.py Outdated

pan-x-c added 2 commits April 23, 2026 18:33

update doc

e47b4b6

fix comments

53aba02

fix tests

0e0fdec

optmize ray actor restart

99f9fcd

fix tests

6195bfd

chenyushuo approved these changes Apr 24, 2026

View reviewed changes

chenyushuo merged commit 10796ed into agentscope-ai:main Apr 24, 2026
2 checks passed

Conversation

pan-x-c commented Apr 23, 2026

Description

Checklist

Uh oh!

pan-x-c commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Summary

Failed Tests

Skipped

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Summary

Failed Tests

Skipped

Tests

Uh oh!

pan-x-c commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Summary

Tests

Uh oh!

pan-x-c commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Summary

Skipped

Tests

Uh oh!

pan-x-c commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Summary

Skipped

Tests

Uh oh!

pan-x-c commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Summary

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants