Summary
Purpose
This document defines the platform-management design and its relationship with the existing PyPTO Serving layers.
The goal is to add a platform-management part to PyPTO Serving without changing the repository's core design principle: keep the shortest path from request scheduling to PyPTO/Simpler execution. Platform management should make the system more scalable and robust, but it should not become another model-execution abstraction layer.
Context
PyPTO Serving currently focuses on a minimal local inference path:
HTTP / CLI
-> Async serving control plane
-> Scheduler and KV cache manager
-> Worker process
-> PyPTO executor
-> Simpler runtime
-> Ascend NPU kernels
The platform proposal adds a separate layer for distributed-system management. It addresses the problems that appear when a local inference path becomes a multi-instance service:
- Workload spikes require dynamic resource scaling.
- Placement should be aware of latency, topology, and application-specific traffic patterns.
- Faults should be handled without restarting the whole service.
- The deployment should be reconfigurable so model partitions and kernels can move to better resources.
Design Split
Serving should be split into two major submodules:
| Submodule |
Owner |
Responsibility |
| Platform |
VNL platform work |
Start and scale the distributed system, create communication channels, manage replicas, monitor health, react to faults, and expose topology/resource changes to the model layer. |
| Model support |
PyPTO Serving model layer |
Handle LLM-specific behavior: request lifecycle, batching, model partition execution, KV cache policy, prefix reuse, token scheduling, sampling, and model-specific PyPTO executor logic. |
The platform submodule owns the process during bootstrap. After the initial deployment is ready, it hands request execution to the model-support submodule and becomes a passive management service. From that point on, it reacts to explicit model-support requests and health/resource events.
Examples of platform requests from the model-support layer:
- Spawn more replicas for a partition when demand exceeds capacity.
- Remove or drain replicas when demand drops.
- Create or delete data channels between partitions.
- Reconfigure model partition placement based on topology or resource availability.
- Start periodic services such as heartbeat, monitoring, and scaling decisions.
- Repair unhealthy instances and publish updated channel/endpoint metadata to the model layer.
Target Architecture
Client / API
|
v
Service entry layer
|
v
Async serving control plane
|
+-------------------- Platform management API --------------------+
| |
| Deployment manager |
| - owns desired deployment state |
| - maps model partitions to instances |
| - asks runtime/cloud/Lingqu backend to spawn or remove nodes |
| |
| Channel manager |
| - creates device payload channels between model partitions |
| - creates control channels between coordinators and replicas |
| |
| Health and monitoring |
| - heartbeat |
| - replica status |
| - load and capacity signals |
| |
| Scaling and placement policy |
| - ramp up/down replicas |
| - topology-aware placement |
| - fault response and replacement |
+-----------------------------------------------------------------+
|
v
Scheduler / KV cache / model support
|
v
Partition coordinators and replicas
|
v
PyPTO executor -> Simpler -> Ascend NPU
The platform-management API sits beside the current async serving control plane. It should not sit between the scheduler and executor for every token step. Token-level execution should remain on the current minimal path.
Feature Overview
Platform management has several responsibilities. This section keeps them at overview level so the first implementation can focus on one narrow feature without losing the larger design direction.
Host and Device Boundary
The platform layer must keep a strict boundary between host-side orchestration and device-side execution.
Host-side tasks are control tasks. A task running on the host should not execute model kernels or move tensors through host memory as the steady-state data path. Its responsibility is to launch, configure, supervise, and terminate a Simpler runtime instance for the assigned partition or replica.
Device-side runtime instances do the actual model work:
- The host task starts a Simpler runtime instance with the partition's model subset, device placement, and channel descriptors.
- PyPTO operators execute on the Ascend device through that Simpler runtime instance.
- Tensor payload channels between partitions should be created as device-side channels whenever they move activations, token tensors, logits, KV-related tensors, or other model data.
- Host-side channels should be limited to control-plane traffic: lifecycle commands, readiness, health, monitoring, placement decisions, and error reporting.
- Data should not bounce through host memory between model partitions unless there is an explicit fallback or debug mode.
This means a platform task is not the model executor itself. It is the host-side launcher and supervisor for a device-side executor.
Deployment Model
The platform layer should describe the service as a deployment graph:
- A
deployment is the desired distributed application.
- A
partition is a model or service stage that owns a subset of work.
- A
task is the host-side launcher/supervisor for the partition's Simpler runtime instance.
- An
edge is a data/control channel between partitions, with tensor payload channels placed on device when they carry model data.
- A
replica is an instance that can execute a partition's work.
- A
coordinator owns routing and load balancing for a partition.
This maps to the platform configuration model:
| Concept |
Platform role |
Meaning for PyPTO Serving |
| Deployment |
Top-level desired state |
Whole serving application, including partitions, edges, request manager, heartbeat settings, and control-buffer settings. |
| Partition |
Logical execution stage |
A shard/stage of model execution, for example a pipeline stage, expert group, prefill stage, or decode stage. |
| Replica |
Concrete runtime instance |
A concrete instance assigned to run a partition. |
| Task |
Host-side launcher |
The control function that starts and supervises a partition-local Simpler runtime instance. |
| Edge |
Communication link |
A channel between producers and consumers. Tensor payload edges should be device-side; control edges can be host-side. |
The deployment model needs enough metadata to describe where host control tasks run, which Simpler runtime instance they launch, where model execution happens, and where channels are placed. The detailed API proposal for this is in the deep dive below.
Coordinator and Replica Pattern
The platform uses coordinator/replica roles to reduce deployment complexity when scaling out a partition.
At a high level:
- The coordinator receives jobs from upstream partitions or the serving control plane.
- The coordinator tracks available replicas.
- The coordinator sends ready jobs to replicas over host-side control channels.
- A replica's host-side task launches or references the partition-local Simpler runtime instance.
- The Simpler runtime instance executes the partition-local model function on the device.
- Tensor outputs flow over device-side data channels when another model partition consumes them.
- The coordinator marks the replica available again and forwards completion/status metadata.
The coordinator should not inspect token-level model details or tensor payloads. It only sees requests, readiness, health, placement metadata, and channel metadata.
Channel Management
The platform layer owns channel lifecycle. Model support should request channels by intent rather than constructing distributed transport directly.
Channel placement follows the host/device boundary:
- Tensor payload channels are device-side channels. They move activations, token tensors, logits, KV-related tensors, and partition outputs directly between device-side runtime instances.
- Control channels are host-side channels. They move lifecycle commands, job metadata, readiness, health, monitoring, and errors.
- Host-mediated tensor movement is a fallback/debug path, not the normal serving path.
The channel controller should follow a desired-state reconciliation model: register desired producers and consumers, compare desired channels against actual channels, create missing channels through the selected backend's resource-exchange mechanism, and remove stale channels when they are no longer desired.
Health and Monitoring
Reliability and zero-downtime fault response are core platform goals. Heartbeat provides liveness, while monitoring provides load and capacity signals.
Recovery routing is owned by the platform:
- If a replica becomes unhealthy, the coordinator stops assigning it new jobs.
- The platform computes the replacement placement, requests a new instance if needed, and rebuilds the required host control channels and device tensor channels.
- The platform updates model support with new channel handles, endpoint descriptors, or partition routing metadata.
- Model support may pause, fail, or replay affected in-flight work according to model semantics, but it should not choose the replacement instance or reroute topology.
Useful monitoring signals include queue depth, pending requests, decode-token throughput, prefill latency, NPU memory use, KV-cache pressure, and channel backpressure.
Scaling and Placement
Dynamic scaling is a later capability built on the same deployment and channel metadata.
Scale-up can be triggered by queue depth, latency targets, token-throughput saturation, KV-cache pressure, or failed capacity. Scale-down should be drain based: mark a replica draining, stop assigning new jobs, wait for in-flight work or timeout, unregister from the coordinator, delete channels, and release backend resources.
Topology-aware placement should consider NPU topology, device memory, channel locality, partition type, and KV-cache locality. The model layer can provide pressure and locality hints, but the platform owns placement decisions.
Supported Parallelism
The platform should be able to describe and place several forms of parallelism:
- Pipeline parallelism: each pipeline stage is a partition connected by edges.
- Expert parallelism: expert subsets can be placed as partitions and selected through model-support routing logic.
- Batch parallelism: whole-model or stage replicas can be scaled and load-balanced by coordinators.
- Tensor parallelism: model support owns tensor-level execution details; platform provides placement and channels for tensor-parallel groups.
The boundary is important: platform describes placement, channels, and replica lifecycle; PyPTO Serving model support describes how a partition computes.
Deep Dive: Static Deployment API
The first implementation target should be static deployment description and startup. This is the easiest useful feature because it defines the host/device boundary and the API shape without requiring dynamic scaling, failure replacement, or topology-aware reconfiguration.
This deep dive is a proposed API shape, not an existing schema in the repository.
Goals
- Describe partitions, replicas, host tasks, Simpler runtime instances, and channels.
- Make host/device placement explicit.
- Start a fixed deployment from a desired-state spec.
- Launch one Simpler runtime instance per replica task.
- Create host control channels and device tensor channels.
- Publish runtime and channel descriptors to model support.
Non-Goals for the First API
- No automatic scale-up or scale-down.
- No automatic fault replacement.
- No topology optimizer.
- No model-layer rerouting policy.
- No host-side tensor data path except explicit fallback/debug modes.
Proposed Deployment Shape
A first proposed deployment shape can stay minimal:
{
"name": "qwen3-14b-serving",
"partitions": [
{
"name": "prefill",
"model_range": {"layers": [0, 47]},
"task_placement": "host",
"runtime": "simpler",
"execution_placement": "device",
"parallelism": "batch",
"replicas": 1
},
{
"name": "decode",
"model_range": {"layers": [0, 47]},
"task_placement": "host",
"runtime": "simpler",
"execution_placement": "device",
"parallelism": "batch",
"replicas": 1
}
],
"channels": [
{"name": "api_to_prefill", "producer": "api", "consumer": "prefill", "placement": "host", "kind": "control"},
{"name": "prefill_to_decode", "producer": "prefill", "consumer": "decode", "placement": "device", "kind": "tensor"},
{"name": "decode_to_api", "producer": "decode", "consumer": "api", "placement": "host", "kind": "control"}
],
"heartbeat": {"enabled": true, "interval_ms": 1000, "tolerance_ms": 3000}
}
The platform API should eventually support richer partition metadata for pipeline parallelism, expert parallelism, batch parallelism, tensor parallelism, KV-cache locality, and NPU topology.
Proposed Python API
The Python API can mirror the deployment shape with typed data classes. Exact field names can change, but the placement concepts should remain explicit.
from dataclasses import dataclass
from typing import Literal
Placement = Literal["host", "device"]
ChannelKind = Literal["control", "health", "tensor"]
@dataclass(frozen=True)
class ModelRange:
layers: tuple[int, int]
@dataclass(frozen=True)
class PartitionSpec:
name: str
model_range: ModelRange
task_placement: Literal["host"]
runtime: Literal["simpler"]
execution_placement: Literal["device"]
parallelism: str
replicas: int
@dataclass(frozen=True)
class ChannelSpec:
name: str
producer: str
consumer: str
placement: Placement
kind: ChannelKind
capacity: int | None = None
payload_size: int | None = None
@dataclass(frozen=True)
class HeartbeatSpec:
enabled: bool
interval_ms: int
tolerance_ms: int
@dataclass(frozen=True)
class DeploymentSpec:
name: str
partitions: tuple[PartitionSpec, ...]
channels: tuple[ChannelSpec, ...]
heartbeat: HeartbeatSpec
Platform Manager API
The first platform manager API should focus on startup, shutdown, and publishing resolved descriptors:
The Python-facing PlatformManager can be implemented as bindings over the existing C++ platform runtime, keeping Python as the serving/control API while C++ realizes deployment, host task launch, channel creation, and runtime supervision.
class PlatformManager:
def start(self, deployment: DeploymentSpec) -> "RuntimePlan": ...
def stop(self) -> None: ...
def get_runtime_plan(self) -> "RuntimePlan": ...
start() should validate the deployment, launch host-side tasks, start Simpler runtime instances, create channels with the requested placement, and return a resolved runtime plan.
RuntimePlan is the model layer's read-only view of what the platform created:
@dataclass(frozen=True)
class RuntimeEndpoint:
partition: str
replica_id: str
host_task_id: str
simpler_instance_id: str
device_id: str
@dataclass(frozen=True)
class ChannelHandle:
name: str
producer: str
consumer: str
placement: Placement
kind: ChannelKind
handle: object
@dataclass(frozen=True)
class RuntimePlan:
endpoints: tuple[RuntimeEndpoint, ...]
channels: tuple[ChannelHandle, ...]
Model support should consume RuntimePlan but should not mutate it. If a later platform version replaces a replica or channel, the platform publishes a new plan or an incremental update.
Channel Creation API
For the static version, channel creation can be internal to PlatformManager.start(). If exposed, it should preserve placement:
create_channel(source_partition, target_partition, channel_kind, placement, capacity, size)
delete_channel(channel_id)
subscribe(channel_id, message_type, handler)
publish(channel_id, message)
Channel kinds and placement should distinguish:
- Device payload channels for activations, token batches, logits, KV-related movement, and partition-to-partition tensor flow.
- Host control channels for coordinator/replica commands and responses.
- Host health channels for heartbeat and status events.
Relation to PyPTO Serving Layers
Service Entry Layer
Current files:
Expected platform relation:
- Add platform deployment configuration options.
- Choose local-only mode or distributed platform mode.
- Initialize the platform manager before starting model-serving traffic.
The default should remain local and minimal. Distributed platform mode should be explicit.
HTTP API Layer
Current files:
Expected platform relation:
- Continue handling OpenAI-compatible requests and health endpoints.
- Expose high-level service health that includes platform status when enabled.
- Avoid exposing internal platform controls through the public inference API unless explicitly needed.
Async Serving Control Plane
Current files:
python/core/async_engine.py
python/core/scheduler.py
python/core/kv_cache.py
Expected platform relation:
- The async engine should call the platform manager for coarse-grained deployment changes, not for each token step.
- The scheduler should keep deciding request batching and token budgets.
- KV-cache policy should remain model support, but it may provide placement hints such as KV locality and memory pressure.
Model Execution Layer
Current files:
python/core/serving_worker.py
python/core/executor.py
python/core/pypto_executor.py
examples/model/qwen3_14b/runner/
Expected platform relation:
- Workers become replicas when platform mode is enabled.
- A partition replica owns a model subset and a local Simpler/PyPTO executor on the assigned device.
- The host-side task for a replica starts and supervises the Simpler runtime instance.
- Model execution happens on the Ascend device, not in the host platform task.
- The platform manager starts and stops replicas, but does not own model execution logic.
Backend Runtime Layer
Current files:
pypto-lib/
- Simpler runtime integration
Expected platform relation:
- Simpler remains the runtime for NPU dispatch.
- Platform code should not introduce a competing model runtime.
- Platform backend adapters can target local processes, MPI-based launch, cloud-style emulation, or Lingqu infrastructure, but the model-execution path should still terminate in PyPTO/Simpler.
- The Python platform API should call into the C++ platform runtime through bindings rather than reimplementing platform orchestration in Python.
- Tensor data channels between model partitions should be created on the device side when they carry model data.
- Host runtime code should only orchestrate Simpler instances and control-plane channels.
Static Bootstrap Flow
Initial startup should follow this sequence:
parse serving and deployment config
-> initialize platform backend
-> validate deployment graph
-> allocate initial instances
-> start partition coordinators
-> start initial replicas
-> create host-side control channels
-> launch Simpler runtime instances from host-side tasks
-> create device-side tensor payload channels
-> initialize heartbeat and monitoring on the host control plane
-> initialize PyPTO model executors inside device-side Simpler runtime instances
-> mark deployment ready
-> start accepting model requests
This matches the issue discussion: the platform submodule takes ownership immediately upon launch and relinquishes request execution to model support once initialization is ready.
Future Platform APIs
Later platform work can extend the static API with coarse-grained management calls. These should remain outside the per-token hot path.
Potential future calls:
spawn_replica(partition, resources) for dynamic scale-up or failure replacement.
drain_replica(replica) for controlled scale-down.
remove_replica(replica) after drain completes.
get_topology() for topology-aware placement.
get_health() for liveness snapshots.
report_load(metrics) for scheduler and worker load signals.
The platform owns these decisions and publishes updated RuntimePlan data to model support after changes are applied.
Future Failure Handling
Failure handling should build on the static deployment API. When a replica fails, the platform should mark it unhealthy, stop assigning work to it, create replacement capacity, rebuild host control channels and device tensor channels, and publish an updated RuntimePlan to model support. Model support can then resume, retry, or fail affected work according to model semantics.
Coordinator failure requires stronger state handling and is a later milestone. A minimal first implementation can treat coordinator failure as partition-level failure and restart the partition's coordinator and replicas.
Non-Goals
- Do not turn PyPTO Serving into a full production serving framework.
- Do not put platform calls in the per-token execution hot path.
- Do not duplicate model-specific scheduling, KV-cache, or sampling policy inside the platform layer.
- Do not introduce a second NPU execution runtime alongside Simpler.
- Do not require distributed platform mode for the current single-node reference path.
References
Area
Executor or runtime
Motivation / Use Case
Enhance PTO serving capabilities to support distributed execution
Proposed API / Behavior
No response
Alternatives Considered
No response
Additional Context
No response
Summary
Purpose
This document defines the platform-management design and its relationship with the existing PyPTO Serving layers.
The goal is to add a platform-management part to PyPTO Serving without changing the repository's core design principle: keep the shortest path from request scheduling to PyPTO/Simpler execution. Platform management should make the system more scalable and robust, but it should not become another model-execution abstraction layer.
Context
PyPTO Serving currently focuses on a minimal local inference path:
The platform proposal adds a separate layer for distributed-system management. It addresses the problems that appear when a local inference path becomes a multi-instance service:
Design Split
Serving should be split into two major submodules:
The platform submodule owns the process during bootstrap. After the initial deployment is ready, it hands request execution to the model-support submodule and becomes a passive management service. From that point on, it reacts to explicit model-support requests and health/resource events.
Examples of platform requests from the model-support layer:
Target Architecture
The platform-management API sits beside the current async serving control plane. It should not sit between the scheduler and executor for every token step. Token-level execution should remain on the current minimal path.
Feature Overview
Platform management has several responsibilities. This section keeps them at overview level so the first implementation can focus on one narrow feature without losing the larger design direction.
Host and Device Boundary
The platform layer must keep a strict boundary between host-side orchestration and device-side execution.
Host-side tasks are control tasks. A task running on the host should not execute model kernels or move tensors through host memory as the steady-state data path. Its responsibility is to launch, configure, supervise, and terminate a Simpler runtime instance for the assigned partition or replica.
Device-side runtime instances do the actual model work:
This means a platform task is not the model executor itself. It is the host-side launcher and supervisor for a device-side executor.
Deployment Model
The platform layer should describe the service as a deployment graph:
deploymentis the desired distributed application.partitionis a model or service stage that owns a subset of work.taskis the host-side launcher/supervisor for the partition's Simpler runtime instance.edgeis a data/control channel between partitions, with tensor payload channels placed on device when they carry model data.replicais an instance that can execute a partition's work.coordinatorowns routing and load balancing for a partition.This maps to the platform configuration model:
The deployment model needs enough metadata to describe where host control tasks run, which Simpler runtime instance they launch, where model execution happens, and where channels are placed. The detailed API proposal for this is in the deep dive below.
Coordinator and Replica Pattern
The platform uses coordinator/replica roles to reduce deployment complexity when scaling out a partition.
At a high level:
The coordinator should not inspect token-level model details or tensor payloads. It only sees requests, readiness, health, placement metadata, and channel metadata.
Channel Management
The platform layer owns channel lifecycle. Model support should request channels by intent rather than constructing distributed transport directly.
Channel placement follows the host/device boundary:
The channel controller should follow a desired-state reconciliation model: register desired producers and consumers, compare desired channels against actual channels, create missing channels through the selected backend's resource-exchange mechanism, and remove stale channels when they are no longer desired.
Health and Monitoring
Reliability and zero-downtime fault response are core platform goals. Heartbeat provides liveness, while monitoring provides load and capacity signals.
Recovery routing is owned by the platform:
Useful monitoring signals include queue depth, pending requests, decode-token throughput, prefill latency, NPU memory use, KV-cache pressure, and channel backpressure.
Scaling and Placement
Dynamic scaling is a later capability built on the same deployment and channel metadata.
Scale-up can be triggered by queue depth, latency targets, token-throughput saturation, KV-cache pressure, or failed capacity. Scale-down should be drain based: mark a replica draining, stop assigning new jobs, wait for in-flight work or timeout, unregister from the coordinator, delete channels, and release backend resources.
Topology-aware placement should consider NPU topology, device memory, channel locality, partition type, and KV-cache locality. The model layer can provide pressure and locality hints, but the platform owns placement decisions.
Supported Parallelism
The platform should be able to describe and place several forms of parallelism:
The boundary is important: platform describes placement, channels, and replica lifecycle; PyPTO Serving model support describes how a partition computes.
Deep Dive: Static Deployment API
The first implementation target should be static deployment description and startup. This is the easiest useful feature because it defines the host/device boundary and the API shape without requiring dynamic scaling, failure replacement, or topology-aware reconfiguration.
This deep dive is a proposed API shape, not an existing schema in the repository.
Goals
Non-Goals for the First API
Proposed Deployment Shape
A first proposed deployment shape can stay minimal:
{ "name": "qwen3-14b-serving", "partitions": [ { "name": "prefill", "model_range": {"layers": [0, 47]}, "task_placement": "host", "runtime": "simpler", "execution_placement": "device", "parallelism": "batch", "replicas": 1 }, { "name": "decode", "model_range": {"layers": [0, 47]}, "task_placement": "host", "runtime": "simpler", "execution_placement": "device", "parallelism": "batch", "replicas": 1 } ], "channels": [ {"name": "api_to_prefill", "producer": "api", "consumer": "prefill", "placement": "host", "kind": "control"}, {"name": "prefill_to_decode", "producer": "prefill", "consumer": "decode", "placement": "device", "kind": "tensor"}, {"name": "decode_to_api", "producer": "decode", "consumer": "api", "placement": "host", "kind": "control"} ], "heartbeat": {"enabled": true, "interval_ms": 1000, "tolerance_ms": 3000} }The platform API should eventually support richer partition metadata for pipeline parallelism, expert parallelism, batch parallelism, tensor parallelism, KV-cache locality, and NPU topology.
Proposed Python API
The Python API can mirror the deployment shape with typed data classes. Exact field names can change, but the placement concepts should remain explicit.
Platform Manager API
The first platform manager API should focus on startup, shutdown, and publishing resolved descriptors:
The Python-facing
PlatformManagercan be implemented as bindings over the existing C++ platform runtime, keeping Python as the serving/control API while C++ realizes deployment, host task launch, channel creation, and runtime supervision.start()should validate the deployment, launch host-side tasks, start Simpler runtime instances, create channels with the requested placement, and return a resolved runtime plan.RuntimePlanis the model layer's read-only view of what the platform created:Model support should consume
RuntimePlanbut should not mutate it. If a later platform version replaces a replica or channel, the platform publishes a new plan or an incremental update.Channel Creation API
For the static version, channel creation can be internal to
PlatformManager.start(). If exposed, it should preserve placement:Channel kinds and placement should distinguish:
Relation to PyPTO Serving Layers
Service Entry Layer
Current files:
python/cli/main.pyExpected platform relation:
The default should remain local and minimal. Distributed platform mode should be explicit.
HTTP API Layer
Current files:
python/core/server.pyExpected platform relation:
Async Serving Control Plane
Current files:
python/core/async_engine.pypython/core/scheduler.pypython/core/kv_cache.pyExpected platform relation:
Model Execution Layer
Current files:
python/core/serving_worker.pypython/core/executor.pypython/core/pypto_executor.pyexamples/model/qwen3_14b/runner/Expected platform relation:
Backend Runtime Layer
Current files:
pypto-lib/Expected platform relation:
Static Bootstrap Flow
Initial startup should follow this sequence:
This matches the issue discussion: the platform submodule takes ownership immediately upon launch and relinquishes request execution to model support once initialization is ready.
Future Platform APIs
Later platform work can extend the static API with coarse-grained management calls. These should remain outside the per-token hot path.
Potential future calls:
spawn_replica(partition, resources)for dynamic scale-up or failure replacement.drain_replica(replica)for controlled scale-down.remove_replica(replica)after drain completes.get_topology()for topology-aware placement.get_health()for liveness snapshots.report_load(metrics)for scheduler and worker load signals.The platform owns these decisions and publishes updated
RuntimePlandata to model support after changes are applied.Future Failure Handling
Failure handling should build on the static deployment API. When a replica fails, the platform should mark it unhealthy, stop assigning work to it, create replacement capacity, rebuild host control channels and device tensor channels, and publish an updated
RuntimePlanto model support. Model support can then resume, retry, or fail affected work according to model semantics.Coordinator failure requires stronger state handling and is a later milestone. A minimal first implementation can treat coordinator failure as partition-level failure and restart the partition's coordinator and replicas.
Non-Goals
References
Area
Executor or runtime
Motivation / Use Case
Enhance PTO serving capabilities to support distributed execution
Proposed API / Behavior
No response
Alternatives Considered
No response
Additional Context
No response