Skip to content

Refactor WorkflowRuner and Scheduler to handling partially completed tasks#530

Merged
chenyushuo merged 14 commits into
agentscope-ai:mainfrom
pan-x-c:feature/partial_return_task
Apr 24, 2026
Merged

Refactor WorkflowRuner and Scheduler to handling partially completed tasks#530
chenyushuo merged 14 commits into
agentscope-ai:mainfrom
pan-x-c:feature/partial_return_task

Conversation

@pan-x-c
Copy link
Copy Markdown
Collaborator

@pan-x-c pan-x-c commented Apr 23, 2026

Description

This pull request introduces support for handling and returning partially completed tasks in the scheduler during over-rollout scenarios. It adds a new configuration option to control this behavior, updates the scheduler logic to track partial completions and emit partial results, and extends the test suite to verify correct handling of partial successes. The changes also refactor the Status class to track the number of completed and total runs, improving the granularity of task completion status.

Partial Task Handling and Over-Rollout Improvements:

  • Added a new configuration option return_partial_tasks in OverRolloutConfig to control whether tasks with partial successful runs are returned during over-rollout cleanup. (trinity/common/config.py)
  • Modified the scheduler to accumulate run results per task, emit partial results when enabled, and track completed versus total runs for each task. This includes new methods for accumulating, building, and emitting task results, as well as collecting and emitting partial tasks for a batch. (trinity/explorer/scheduler.py) [1] [2] [3] [4] [5] [6] [7] [8]
  • Updated the explorer to pass the return_partial_tasks flag to the scheduler when retrieving results for both exploration and evaluation steps. (trinity/explorer/explorer.py) [1] [2]

Status Tracking and Refactoring:

  • Refactored the Status class to track completed_runs and total_runs instead of a simple ok boolean, with a property to determine overall success. Added a new RunnerExecutionResult dataclass for clarity. (trinity/explorer/workflow_runner.py)
  • Updated all relevant code paths to use the new Status structure, ensuring accurate reporting of partial completions and error messages. (trinity/explorer/scheduler.py) [1] [2] [3]

Testing Enhancements:

  • Added new tests to verify correct handling and reporting of partial task completions in both the scheduler and workflow runner, including custom dummy workflows to simulate partial failures. (tests/explorer/scheduler_test.py, tests/explorer/workflow_test.py) [1] [2] [3] [4] [5]

These changes provide better observability and control over task execution in distributed or unreliable environments, ensuring that partially successful work is not lost and is properly reported.

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@pan-x-c
Copy link
Copy Markdown
Collaborator Author

pan-x-c commented Apr 23, 2026

/unittest-diff

@github-actions
Copy link
Copy Markdown

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
116 113 1 2 0 0 29m 53s

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 The test failed in the call phase due to an assertion error

Skipped

Tests Status
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content skipped ⏭️
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 23.3s
tests/common/config_test.py::TestConfig::test_chat_template_path 97ms
tests/common/config_test.py::TestConfig::test_config_flatten 35ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 209ms
tests/common/config_test.py::TestConfig::test_default_workflow 96ms
tests/common/config_test.py::TestConfig::test_load_default_config 1.4s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 97ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 96ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 1.6s
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_build_experience_token_view_aligns_prompt_action_mask_and_logprobs 1ms
tests/common/experience_test.py::TestExperience::test_deserialize_legacy_pickle_payload 2ms
tests/common/experience_test.py::TestExperience::test_deserialize_single_rejects_batch_payload 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_action_mask 1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_decoded_token_text 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 14ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_print_colored_tokens_writes_to_file 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_deserialize_many 1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_with_shared_multimodal_tensor 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/external_model_test.py::TestExternalModel::test_external_model 50.8s
tests/common/external_model_test.py::TestExternalModelLoad::test_external_model_load 2.3s
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_first_message_is_assistant 597ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_messages_empty 274ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_no_assistant_messages 588ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_normal_conversation_data 275ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 2ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 1m 12s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 42.4s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 55.1s
tests/common/vllm_test.py::TestModelLen_0::test_model_len 41.4s
tests/common/vllm_test.py::TestModelLen_1::test_model_len 27.6s
tests/common/vllm_test.py::TestModelLen_2::test_model_len 27.3s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 27.3s
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 27.7s
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 27.8s
tests/common/vllm_test.py::TestAPIServer::test_api 28.3s
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content ⏭️ 515ms
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 24.1s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 26.0s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 307ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 576ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 2m 26s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 1m 58s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 1m 54s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 46.0s
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 1m 45s
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 1m 15s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 1m 25s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 3m 6s
tests/explorer/explorer_test.py::ServeTest::test_serve 1m 3s
tests/explorer/proxy_test.py::RecorderTest::test_recorder 61ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 5.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 12.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 30.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 4.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 4.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 4.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 6.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 5.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 13.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks 9.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 15.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 9.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 8.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 25.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 7.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 13.6s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 10.6s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 602ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 27ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 17ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 138ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 7ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 13ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 8ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 101ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 201ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 22.1s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 22.4s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 3.3s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 142ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_0_sequential 38ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_1_asynchronous 39ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_2_multi_threading 77ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 24.3s
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner 44.9s

Github Test Reporter by CTRF 💚

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional support for returning partially completed tasks during scheduler over-rollout cleanup, with more granular run-level success reporting via an updated Status model.

Changes:

  • Introduces OverRolloutConfig.return_partial_tasks and validates it in config validation.
  • Refactors runner/scheduler result handling to track completed_runs vs total_runs and optionally emit partial task results during cleanup.
  • Extends tests to cover partial-success behavior and updates an agentscope dependency constraint.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
trinity/explorer/workflow_runner.py Refactors execution to aggregate per-run outcomes into Status(completed_runs/total_runs) and supports partial collection/fail-fast modes.
trinity/explorer/scheduler.py Tracks per-task aggregated results across subtasks and can emit partially completed tasks during cleanup.
trinity/explorer/explorer.py Threads return_partial_tasks flag through scheduler result collection for explore/eval steps.
trinity/common/config_validator.py Extends over-rollout validation to cover return_partial_tasks and updates an error message.
trinity/common/config.py Adds return_partial_tasks option to OverRolloutConfig.
tests/trainer/trainer_test.py Enables return_partial_tasks in an over-rollout trainer integration test configuration.
tests/explorer/workflow_test.py Adds runner tests for partial success + fail-fast behavior across concurrent modes; adjusts agentscope adapter test.
tests/explorer/scheduler_test.py Adds scheduler test verifying partial task emission during over-rollout cleanup.
pyproject.toml Bumps agentscope[tuner] minimum version to >=1.0.19.
Comments suppressed due to low confidence (1)

trinity/explorer/scheduler.py:189

  • run_with_retry’s docstring wasn’t updated for the new collect_partial_runs parameter, so it’s unclear how this flag affects runner execution and retries. Please document the parameter and its semantics (e.g., whether it controls fail-fast vs collecting partial successes across repeats).
        """
        Args:
            task (`TaskWrapper`): The task to run.
            repeat_times (`int`): The number of times to repeat the task.
            run_id_base (`int`): The base run id for this task runs.
            timeout (`float`): The timeout for each task run.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread trinity/explorer/workflow_runner.py
Comment thread trinity/explorer/workflow_runner.py Outdated
Comment thread tests/explorer/workflow_test.py Outdated
@pan-x-c
Copy link
Copy Markdown
Collaborator Author

pan-x-c commented Apr 23, 2026

/unittest-diff

@github-actions
Copy link
Copy Markdown

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
119 116 1 2 0 0 31m 29s

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks The test failed in the call phase due to an assertion error

Skipped

Tests Status
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content skipped ⏭️
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 23.3s
tests/common/config_test.py::TestConfig::test_chat_template_path 97ms
tests/common/config_test.py::TestConfig::test_config_flatten 36ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 208ms
tests/common/config_test.py::TestConfig::test_default_workflow 95ms
tests/common/config_test.py::TestConfig::test_load_default_config 1.4s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 96ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 96ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 1.6s
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_build_experience_token_view_aligns_prompt_action_mask_and_logprobs 1ms
tests/common/experience_test.py::TestExperience::test_deserialize_legacy_pickle_payload 2ms
tests/common/experience_test.py::TestExperience::test_deserialize_single_rejects_batch_payload 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_action_mask 1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_decoded_token_text 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 14ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_print_colored_tokens_writes_to_file 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_deserialize_many 1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_with_shared_multimodal_tensor 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/external_model_test.py::TestExternalModel::test_external_model 55.3s
tests/common/external_model_test.py::TestExternalModelLoad::test_external_model_load 2.1s
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_first_message_is_assistant 579ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_messages_empty 265ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_no_assistant_messages 542ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_normal_conversation_data 256ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 2ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 1m 7s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 42.1s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 38.6s
tests/common/vllm_test.py::TestModelLen_0::test_model_len 42.7s
tests/common/vllm_test.py::TestModelLen_1::test_model_len 27.1s
tests/common/vllm_test.py::TestModelLen_2::test_model_len 33.1s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 27.1s
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 26.7s
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 26.7s
tests/common/vllm_test.py::TestAPIServer::test_api 25.1s
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content ⏭️ 515ms
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 26.3s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 25.2s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 289ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 577ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 2m 28s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 2m 27s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 1m 53s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 46.1s
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 1m 47s
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 1m 14s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 2m 36s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 3m 8s
tests/explorer/explorer_test.py::ServeTest::test_serve 1m 3s
tests/explorer/proxy_test.py::RecorderTest::test_recorder 80ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 5.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 13.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 30.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 5.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 6.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 14.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks 10.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 15.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 9.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 8.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 25.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 9.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 14.9s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 10.5s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 604ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 29ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 20ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 142ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 5ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 12ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 8ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 102ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 203ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 23.6s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 22.4s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 3.4s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 145ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_0_sequential 71ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_1_asynchronous 60ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_2_multi_threading 540ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_0_sequential 40ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_1_asynchronous 39ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_2_multi_threading 41ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 23.9s
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner 44.1s

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Copy Markdown
Collaborator Author

pan-x-c commented Apr 23, 2026

/unittest-pattern-test_over_rollout_return_partial_tasks

@github-actions
Copy link
Copy Markdown

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
1 1 0 0 0 0 29.7s

Tests

Test Name Status Flaky Duration
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks 16.1s

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Copy Markdown
Collaborator Author

pan-x-c commented Apr 23, 2026

/unittest-module-trainer

@github-actions
Copy link
Copy Markdown

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
27 24 0 3 0 0 51m 41s

Skipped

Tests Status
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer 4m 23s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer 4m 50s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 1m 51s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 1m 18s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 1m 2s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 1m 6s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 1m 13s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer ⏭️ 1ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 38.3s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 34.0s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 34.5s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode 1m 47s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode 1m 43s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode 2m 34s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer 2m 53s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer 5m 53s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 1m 58s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer 1m 49s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer 4m 38s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer 1m 47s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer 3m 36s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer 1m 7s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer 48.2s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer ⏭️ 1ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class ⏭️ 1ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner 1m 24s
tests/trainer/trainer_test.py::ColocateModeTest::test_trainer 2m 3s

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Copy Markdown
Collaborator Author

pan-x-c commented Apr 23, 2026

/unittest-diff

@github-actions
Copy link
Copy Markdown

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
123 121 0 2 0 0 32m 5s

Skipped

Tests Status
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content skipped ⏭️
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 23.2s
tests/common/config_test.py::TestConfig::test_chat_template_path 96ms
tests/common/config_test.py::TestConfig::test_config_flatten 35ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 208ms
tests/common/config_test.py::TestConfig::test_default_workflow 96ms
tests/common/config_test.py::TestConfig::test_load_default_config 1.4s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 98ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 95ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 1.6s
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_build_experience_token_view_aligns_prompt_action_mask_and_logprobs 1ms
tests/common/experience_test.py::TestExperience::test_deserialize_legacy_pickle_payload 3ms
tests/common/experience_test.py::TestExperience::test_deserialize_single_rejects_batch_payload 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_action_mask 1ms
tests/common/experience_test.py::TestExperience::test_format_colored_tokens_uses_decoded_token_text 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 13ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_print_colored_tokens_writes_to_file 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_deserialize_many 1ms
tests/common/experience_test.py::TestExperience::test_serialize_many_with_shared_multimodal_tensor 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/external_model_test.py::TestExternalModel::test_external_model 50.4s
tests/common/external_model_test.py::TestExternalModelLoad::test_external_model_load 2.1s
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_first_message_is_assistant 573ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_messages_empty 273ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_no_assistant_messages 564ms
tests/common/models/utils_test.py::TestTokenizeAndMaskMessagesDefault::test_normal_conversation_data 260ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 1ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 1ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 1ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 1ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 1ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 1ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 1m 13s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 42.3s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 39.0s
tests/common/vllm_test.py::TestModelLen_0::test_model_len 27.6s
tests/common/vllm_test.py::TestModelLen_1::test_model_len 43.0s
tests/common/vllm_test.py::TestModelLen_2::test_model_len 33.0s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 27.0s
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 27.4s
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 27.7s
tests/common/vllm_test.py::TestAPIServer::test_api 29.0s
tests/common/vllm_test.py::TestAPIServer::test_reasoning_content ⏭️ 516ms
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 23.3s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 25.7s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 314ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 588ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 2m 28s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 1m 59s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 2m 21s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 46.1s
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 1m 48s
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 1m 18s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 2m 56s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 3m 6s
tests/explorer/explorer_test.py::ServeTest::test_serve 1m 3s
tests/explorer/proxy_test.py::RecorderTest::test_recorder 65ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 5.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 13.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 29.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 5.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 5.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 5.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 4.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 5.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_async_cancelled_runner_accepts_next_batch 5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 9.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks 5.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_sync_cancel_does_not_imply_immediate_runner_reuse 7.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 14.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 9.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 8.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 25.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_timeout_cleanup_still_restarts_runner 6.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_unexpected_task_exception_restarts_runner 4.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 8.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 13.8s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 10.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 603ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 27ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 17ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 136ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 4ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 11ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 8ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 101ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 201ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 22.4s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 22.2s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 3.4s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 143ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_0_sequential 68ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_1_asynchronous 62ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_2_multi_threading 540ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_0_sequential 39ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_1_asynchronous 38ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_2_multi_threading 40ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 24.2s
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner 45.4s

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Copy Markdown
Collaborator Author

pan-x-c commented Apr 24, 2026

/unittest-module-explorer

@github-actions
Copy link
Copy Markdown

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
59 59 0 0 0 0 17m 18s

Tests

Test Name Status Flaky Duration
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 2m 26s
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 1m 21s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 3m 2s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 3m 29s
tests/explorer/explorer_test.py::ServeTest::test_serve 1m 3s
tests/explorer/proxy_test.py::RecorderTest::test_recorder 59ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 6.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 13.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 30.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 5.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 5.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 5.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 5.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 5.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 5.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_async_cancelled_runner_accepts_next_batch 6.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 9.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_return_partial_tasks 6.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_sync_cancel_does_not_imply_immediate_runner_reuse 7.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 15.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 9.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 8.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 25.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_timeout_cleanup_still_restarts_runner 6.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_unexpected_task_exception_restarts_runner 4.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 8.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 13.6s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 10.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 2ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 602ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 28ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 18ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 136ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 3ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 11ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 8ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 102ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 202ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 22.7s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 22.7s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 3.4s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 144ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_0_sequential 69ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_1_asynchronous 59ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_fail_fast_without_partial_collection_2_multi_threading 540ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_0_sequential 40ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_1_asynchronous 39ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_partial_success_non_repeatable_2_multi_threading 42ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 24.5s
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner 45.5s

Github Test Reporter by CTRF 💚

@chenyushuo chenyushuo merged commit 10796ed into agentscope-ai:main Apr 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants