Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion content/en/docs/actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,4 +78,16 @@ Reclaim action is a **cross-queue** resource reclamation step in the scheduling
> Note:
>
> 1. Reclaim checks multiple conditions during execution: whether the target Queue is reclaimable, whether the task can be reclaimed (Preemptable), whether the job's running requirements can be met after resource reclamation, etc., to ensure the rationality of resource reclamation.
> 2. To make jobs in a Queue reclaimable by other Queues, the reclaimable field in the Queue's spec must be set to true.
> 2. To make jobs in a Queue reclaimable by other Queues, the reclaimable field in the Queue's spec must be set to true.

### Shuffle

#### Introduction

Shuffle action is a task redistribution mechanism in the Volcano scheduler, designed to optimize the distribution of running tasks in the cluster. By selectively evicting certain running tasks, it allows these tasks to re-enter the scheduling queue for reallocation, breaking existing resource allocation patterns and optimizing overall cluster performance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This paragraph is dense and contains a long sentence. Rewriting it with shorter, more direct sentences would improve clarity and readability.

Suggested change
Shuffle action is a task redistribution mechanism in the Volcano scheduler, designed to optimize the distribution of running tasks in the cluster. By selectively evicting certain running tasks, it allows these tasks to re-enter the scheduling queue for reallocation, breaking existing resource allocation patterns and optimizing overall cluster performance.
Shuffle action is a task redistribution mechanism in the Volcano scheduler.
It is designed to optimize the distribution of running tasks in the cluster by selectively evicting some of them.
Evicted tasks re-enter the scheduling queue for reallocation.
This process breaks existing resource allocation patterns and can lead to better overall cluster performance.


#### Scenarios

- **Resource Fragment Consolidation**: When cluster resources become fragmented, Shuffle can reorganize tasks through rescheduling, consolidating scattered resource fragments to create conditions for jobs requiring large contiguous resources.
- **Load Balancing Optimization**: In situations with unbalanced node workloads, Shuffle can redistribute tasks to balance resource utilization across nodes, preventing scenarios where some nodes are overloaded while others remain idle.
- **Scheduling Optimization Breakthrough**: For long-running clusters, initial scheduling decisions may become suboptimal over time. Shuffle periodically reassesses and adjusts task distribution, breaking scheduling deadlocks and finding more efficient resource allocation solutions.
10 changes: 10 additions & 0 deletions content/zh/docs/actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,4 +81,14 @@ Reclaim action是调度流程中的**跨队列**资源回收步骤。与Preempt
> 1. Reclaim在执行时会检查多个条件:目标Queue是否可回收(Reclaimable)、任务是否可被回收(Preemptable)、资源回收后是否满足作业运行需求等,从而确保资源回收的合理性。
> 2. 要使Queue中的作业可以被其他Queue回收资源,需要在Queue的spec中将reclaimable字段设置为true。

### Shuffle

#### 简介

Shuffle action是Volcano调度器中的任务重分配机制,用于优化集群中运行任务的分布。它通过选择性地驱逐(evict)部分运行中的任务,使这些任务重新进入调度队列等待再分配,从而打破现有资源分配模式,优化整体集群性能。

#### 场景

- **资源碎片整合**:当集群资源呈现碎片化状态时,Shuffle可以通过重新调度任务,整合分散的资源碎片,为需要大块连续资源的作业创造条件。
- **负载均衡优化**:在节点负载不均衡的情况下,Shuffle能够重新分配任务,平衡各节点的资源使用率,避免部分节点过载而其他节点闲置的情况。
- **调度优化突破**:对于长时间运行的集群,初始调度决策可能随着时间推移变得不再最优。Shuffle通过周期性地重新评估和调整任务分布,打破调度僵局,寻找更优的资源分配方案。