Robots fail. Relentless workflows don't.
Relentless is a Python framework for building robust, fault-tolerant workflows that thrive in the chaos of real-world robotics. It provides a powerful and intuitive way to handle inevitable failures, so your robots keep working even when things go wrong.
- They drop things.
- Their sensors lie.
- Networks are flaky.
- The unexpected happens.
Traditional workflow systems crumble under these pressures. Relentless was built for the challenge.
- Actionable Compensation: Define how to undo actions or mitigate their effects when steps fail.
- Smart Retries: Configurable backoff strategies (linear, exponential, Fibonacci) with jitter.
- Time-Aware: Wall-clock timeouts, sensor-based triggers, and time-bound compensation.
- Stateful Execution: Leverages Zenoh's built-in persistence for transparent, versioned state management.
- Partial Rollbacks: Undo only what's needed, not the entire workflow.
- Human-in-the-Loop: Escalate to operators when automation hits its limits.
- Distributed Coordination: Built on Zenoh for seamless multi-robot orchestration.
- Physical-World Ready: Designed for irreversible actions, unreliable sensors, and real-world surprises.
pip install relentless-flow
Requirements:
- Python 3.10+
Relentless is built on a formal mathematical model that ensures predictable behavior and robust error recovery. This model is reflected in the Python API through the following concepts:
A Workflow
is a sequence of Steps
with defined success and failure paths. Each Workflow
defines a Compensation Strategy
to be used in case of failure during workflow execution.
from relentless import workflow, Workflow, CompensateReverse, CompensateAction
@workflow
class PickAndPlace(Workflow):
compensation_strategy = CompensateReverse()
def build(self):
return [
MoveTo("bin"),
Grasp(),
MoveTo("conveyor"),
Release()
]
A Step
is an individual, potentially reversible, action within a Workflow
. Each Step
defines how it is executed (.run()
) as well as how it is compensated (.compensate()
). Each Step
can define its own RetryPolicy
and TimeoutPolicy
.
from relentless import Step, RetryPolicy, TimeoutPolicy, Compensation, WorkflowContext
class MoveTo(Step):
def __init__(self, destination: str):
super().__init__(
name=f"move_to_{destination}",
retry_policy=RetryPolicy(
max_attempts=5,
backoff="exponential"
),
timeout_policy=TimeoutPolicy(
timeout=timedelta(seconds=5)
)
)
self.destination = destination
async def run(self, context: WorkflowContext):
await context.zenoh.put(
f"/robot/{context.workflow_id}/arm/target",
self.destination.encode()
)
async def compensate(self, context: WorkflowContext):
await context.zenoh.put(
f"/robot/{context.workflow_id}/arm/home",
b'' # Empty payload triggers default "home" action
)
class Grasp(Step):
def __init__(self):
super().__init__(
name="grasp",
retry_policy=RetryPolicy(
max_attempts=3
),
timeout_policy=TimeoutPolicy(
timeout=timedelta(seconds=2)
)
)
async def run(self, context: WorkflowContext):
result = await context.zenoh.get(
f"/robot/{context.workflow_id}/gripper/close"
)
if not result:
raise GripperError("Failed to close gripper")
async def compensate(self, context: WorkflowContext):
await context.zenoh.put(
f"/robot/{context.workflow_id}/gripper/open",
b''
)
Each Step
can define a compensation action using Compensation
or a CompensateAction
, which is executed if the Step
fails or if a later Step
in the Workflow
fails. Compensation
can be defined as either CompensateReverse
, which executes the .compensate()
method of each Step
in reverse order, CompensateAction
, which defines a custom compensation action, or a custom Compensation
strategy.
from relentless import Step, Compensation, CompensateAction, WorkflowContext
class LogCompensation(CompensateAction):
def __init__(self, message: str):
super().__init__(name='log_compensation')
self.message = message
async def compensate(self, context: WorkflowContext):
await context.zenoh.put(
f"/logs/{context.workflow_id}",
self.message
)
class LogStep(Step):
def __init__(self, log_message: str):
super().__init__(
name="log_step",
compensation=LogCompensation(message=log_message)
)
async def run(self, context: WorkflowContext):
await context.zenoh.put(
f"/logs/{context.workflow_id}",
"Log step executed successfully"
)
from relentless import (
workflow, Workflow, Step, CompensateReverse,
CompensateAction, WorkflowContext, WorkflowExecutor,
RetryPolicy, TimeoutPolicy, Atomic
)
from zenoh import Zenoh
import numpy as np
async def emergency_stop(zenoh: Zenoh):
await zenoh.put("/robot/emergency_stop", b'')
async def log_error(zenoh: Zenoh, message: str):
await zenoh.put(f"/errors/{context.workflow_id}", message)
class MoveTo(Step):
def __init__(self, destination: str):
super().__init__(
name=f"move_to_{destination}",
retry_policy=RetryPolicy(
max_attempts=5,
backoff="exponential"
),
timeout_policy=TimeoutPolicy(
timeout=timedelta(seconds=5)
)
)
self.destination = destination
async def run(self, context: WorkflowContext):
await context.zenoh.put(
f"/arm/{context.workflow_id}/target_pose",
self.destination.tobytes()
)
async def compensate(self, context: WorkflowContext):
await context.zenoh.put(
f"/arm/{context.workflow_id}/target_pose",
SAFE_POSE.tobytes() # compensation is to move back to safe pose
)
class Grasp(Step):
def __init__(self, bin_id: str):
super().__init__(
name="grasp",
retry_policy=RetryPolicy(
max_attempts=3
),
timeout_policy=TimeoutPolicy(
timeout=timedelta(seconds=2)
)
)
self.bin_id = bin_id
async def run(self, context: WorkflowContext):
await context.zenoh.put(f"/gripper/{self.bin_id}/cmd", "close")
async def compensate(self, context: WorkflowContext):
await context.zenoh.put(f"/gripper/{self.bin_id}/cmd", "emergency_release")
@workflow
class BinPicking(Workflow):
compensation_strategy = CompensateReverse()
def __init__(self, bin_id: str):
super().__init__(name=f"bin_picking_{bin_id}")
self.bin_id = bin_id
def build(self):
async def check_vision_confidence(context: WorkflowContext):
confidence = await context.zenoh.get(f"/vision/{self.bin_id}/confidence", timeout=2.0)
if confidence < 0.7:
raise VisionError("Part not clearly visible")
async def check_weight(context: WorkflowContext):
weight = await context.zenoh.get("/load_cell/weight")
if weight > 20.0:
await context.zenoh.put(f"/gripper/{self.bin_id}/cmd", "release")
raise HeavyObjectError(f"Object too heavy: {weight}kg")
return [
Atomic(
name="grab_part",
steps=[
Step(name="check_vision", run=check_vision_confidence, retry_policy=RetryPolicy(max_attempts=1), compensation=CompensateAction(emergency_stop)),
MoveTo(calculate_pose_from_vision(self.bin_id)),
Grasp(self.bin_id)
],
on_failure=emergency_stop(zenoh) # emergency stop if atomic fails
),
Step(name="verify_grip", run=check_weight, retry_policy=RetryPolicy(attempts=2), timeout_policy=TimeoutPolicy(seconds=3)),
MoveTo(get_container_pose()),
Release(self.bin_id)
]
async def main():
zenoh = await Zenoh.connect()
executor = WorkflowExecutor(zenoh)
# Example of running the workflow
await executor.run(BinPicking("cell1"))
from relentless import TimeoutPolicy, WorkflowContext, Step
from datetime import timedelta
class Grasp(Step):
# ...
timeout_policy = TimeoutPolicy(
timeout=timedelta(seconds=2), # or specify a function: timeout=lambda context: context.workflow_config.grasp_timeout
on_timeout=emergency_stop # Or define custom logic: on_timeout=lambda context: context.zenoh.put(...)
)
# ...
Atomic blocks use the defined on_failure
action if any Step
within the block fails. This does not prevent the normal compensation logic from executing.
from relentless import Atomic, Step, WorkflowContext
async def release_and_alert(context: WorkflowContext):
await context.zenoh.put(f"/gripper/{context.workflow_id}/cmd", "release")
await context.zenoh.put(f"/alerts/{context.workflow_id}", "Heavy object detected")
Atomic(
name="load_sensing",
steps=[
Step(name="check_weight", run=check_weight, retry_policy=RetryPolicy(attempts=2), timeout_policy=TimeoutPolicy(seconds=3))
],
on_failure=release_and_alert
)
from relentless import Step, WorkflowContext
class WaitForHuman(Step):
def __init__(self):
super().__init__(name="wait_for_human")
async def run(self, context: WorkflowContext):
while True:
response = await context.zenoh.get(f"/human/{context.workflow_id}/response")
if response and response.value.decode() == "OK":
break
await asyncio.sleep(5)
@workflow
async def critical_process():
try:
await sensitive_operation().retry(1)
except Exception as e:
await notify_operator(e).timeout(
seconds=300,
on_timeout=shutdown_system
)
# Wait for human to resolve
await WaitForHuman()
Relentless uses Zenoh's built-in persistence mechanisms. You can configure persistence using standard Zenoh router configuration files.
Example config.json5
:
{
mode: 'router',
plugins: {
rest: {
port: 8000
},
storage_manager: {
storages: {
workflow_state: {
volume: {
backend: 'rocksdb',
path: '/tmp/zenoh-storage-workflow'
}
}
}
}
},
// other config options
}
Refer to the Zenoh documentation for more details on configuring Zenoh.
async def monitor_workflows(zenoh: Zenoh):
async with zenoh.subscribe("relentless/state/**") as stream:
async for update in stream:
state = WorkflowState.parse_raw(update.value)
print(f"Workflow {state.id} ({state.name}):")
print(f" Status: {state.status}")
print(f" Step: {state.current_step}/{len(state.steps)}")
if state.errors:
print(f" Errors: {state.errors}")
Feature | Relentless | AWS Step Functions | Temporal.io | Zenoh Flow |
---|---|---|---|---|
Compensation | ✅ First-class, multi-strategy | ❌ Limited | ✅ Activities | ❌ Not a workflow system |
Real-World Focus | ✅ Designed for physical robots | ➖ Generic | ➖ Generic | ➖ Data-flow focused |
Zenoh Integration | ✅ Native state, pub/sub, geo-distribution | ❌ AWS-only | ❌ Limited | ✅ |
Timeouts | ✅ Sensor-based + wall-clock | ✅ Wall-clock only | ✅ | ✅ |
Partial Rollback | ✅ Fine-grained control | ❌ All-or-nothing | ❌ | ❌ |
Error Handling | ✅ Retries, timeouts, compensation, escalation | ✅ Retries, timeouts | ✅ | ✅ |
State Management | ✅ Versioned, on any Zenoh-KV | ➖ DynamoDB | ✅ | ✅ |
We welcome contributions! To get started:
- Fork the repository.
- Create a feature branch:
git checkout -b feat/your-feature-name
- Commit your changes:
git commit -m 'Add amazing new feature'
- Push to the branch:
git push origin feat/your-feature-name
- Open a Pull Request.
Relentless is licensed under the MIT License. See LICENSE for details.
- The Zenoh team for their amazing work on distributed robotics communication.
- Inspired by the challenges of real-world robotic deployments.