Skip to content
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5a779b4
fix: testing
maschnetwork Feb 3, 2026
a411a40
feat: introduce OperationKey and refactor ExecutionManager for compos…
maschnetwork Feb 16, 2026
c49d8aa
feat(operations): add parentId support for child context operations
maschnetwork Feb 16, 2026
0844880
feat(context): add replay state tracking and refactor context creation
maschnetwork Feb 16, 2026
92b681d
feat: implement ChildContextOperation, ChildContextFailedException, a…
maschnetwork Feb 16, 2026
707646e
feat: add public methods
maschnetwork Feb 16, 2026
7777543
feat: added integration tests
maschnetwork Feb 16, 2026
c0c9023
feat: added edge case tests
maschnetwork Feb 16, 2026
b035edc
feat: cloud based tests
maschnetwork Feb 16, 2026
490014b
feat: adapted readme and design to match implementation
maschnetwork Feb 16, 2026
df73635
feat: remove debug line from testing commit
maschnetwork Feb 16, 2026
b8c8bed
fix: naming and null handling
maschnetwork Feb 16, 2026
75fbe8a
fix: use run instead of complete
maschnetwork Feb 16, 2026
49cf1e9
fix: run for PENDING
maschnetwork Feb 16, 2026
75a6da4
feat: simplify operation id
maschnetwork Feb 16, 2026
8a2b3ab
fix: revert test changes for operation id
maschnetwork Feb 16, 2026
3ac3b9a
fix: simplify testing
maschnetwork Feb 16, 2026
842a106
fix: addressed review comments
maschnetwork Feb 19, 2026
06e5233
Merge remote-tracking branch 'origin/main' into child-context
zhongkechen Feb 19, 2026
1a4c8c4
fix: review comments
maschnetwork Feb 20, 2026
765fbd5
fix: debug tests
maschnetwork Feb 20, 2026
c1b30a5
fix: used runUntilComplete for slow CI worker
maschnetwork Feb 20, 2026
45ccbff
fix: ci comment
maschnetwork Feb 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 30 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Your durable function extends `DurableHandler<I, O>` and implements `handleReque
- `ctx.createCallback()` – Wait for external events (approvals, webhooks)
- `ctx.invoke()` – Invoke another Lambda function and wait for the result
- `ctx.invokeAsync()` – Start a concurrent Lambda function invocation
- `ctx.runInChildContext()` – Run an isolated child context with its own checkpoint log
- `ctx.runInChildContextAsync()` – Start a concurrent child context

## Quick Start

Expand Down Expand Up @@ -189,6 +191,30 @@ var result = ctx.invoke("invoke-function",

```

### runInChildContext() – Isolated Execution Contexts

Child contexts run an isolated stream of work with their own operation counter and checkpoint log. They support the full range of durable operations — `step`, `wait`, `invoke`, `createCallback`, and nested child contexts.

```java
// Sync: blocks until the child context completes
var result = ctx.runInChildContext("validate-order", String.class, child -> {
var data = child.step("fetch", String.class, () -> fetchData());
child.wait(Duration.ofMinutes(5));
return child.step("validate", String.class, () -> validate(data));
});

// Async: returns a DurableFuture for concurrent execution
var futureA = ctx.runInChildContextAsync("branch-a", String.class, child -> {
return child.step("work-a", String.class, () -> doWorkA());
});
var futureB = ctx.runInChildContextAsync("branch-b", String.class, child -> {
return child.step("work-b", String.class, () -> doWorkB());
});

// Wait for all child contexts to complete
var results = DurableFuture.allOf(futureA, futureB);
```

## Step Configuration

Configure step behavior with `StepConfig`:
Expand Down Expand Up @@ -396,9 +422,10 @@ DurableExecutionException - General durable exception
│ ├── InvokeFailedException - Chained invocation failed. Handle the error or propagate failure.
│ ├── InvokeTimedoutException - Chained invocation timed out. Handle the error or propagate failure.
│ └── InvokeStoppedException - Chained invocation stopped. Handle the error or propagate failure.
└── CallbackException - General callback exception
├── CallbackFailedException - External system sent an error response to the callback. Handle the error or propagate failure
└── CallbackTimeoutException - Callback exceeded its timeout duration. Handle the error or propagate the failure
├── CallbackException - General callback exception
│ ├── CallbackFailedException - External system sent an error response to the callback. Handle the error or propagate failure
│ └── CallbackTimeoutException - Callback exceeded its timeout duration. Handle the error or propagate the failure
└── ChildContextFailedException - Child context failed and the original exception could not be reconstructed
```

```java
Expand Down
98 changes: 98 additions & 0 deletions docs/adr/004-child-context-execution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# ADR-004: Child Context Execution (`runInChildContext`)

**Status:** Accepted
**Date:** 2026-02-16

## Context

The TypeScript and Python durable execution SDKs support child contexts via `OperationType.CONTEXT`, enabling isolated sub-workflows with independent operation counters and checkpoint logs. The Java SDK needs the same capability to support fan-out/fan-in, parallel processing branches, and hierarchical workflow composition.

```java
var futureA = ctx.runInChildContextAsync("branch-a", String.class, child -> {
child.step("validate", Void.class, () -> validate(order));
child.wait(Duration.ofMinutes(5));
return child.step("charge", String.class, () -> charge(order));
});
var futureB = ctx.runInChildContextAsync("branch-b", String.class, child -> { ... });
var results = DurableFuture.allOf(futureA, futureB);
```

## Decision

### Child context as a CONTEXT operation

A child context is a `CONTEXT` operation in the checkpoint log with a three-phase lifecycle:

1. **START** (fire-and-forget) — marks the child context as in-progress
2. Inner operations checkpoint with `parentId` set to the child context's operation ID
3. **SUCCEED** or **FAIL** (blocking) — finalizes the child context

```
Op ID | Parent ID | Type | Action | Payload
------|-----------|---------|---------|--------
3 | null | CONTEXT | START | —
3-1 | 3 | STEP | START | —
3-1 | 3 | STEP | SUCCEED | "result"
3 | null | CONTEXT | SUCCEED | "final result"
```

### Operation ID prefixing

Inner operation IDs are prefixed with the parent context's operation ID using `-` as separator (e.g., `"3-1"`, `"3-2"`). This matches the JavaScript SDK's `stepPrefix` convention and ensures global uniqueness — the backend validates type consistency by operation ID alone.

- Root context: `"1"`, `"2"`, `"3"`
- Child context `"1"`: `"1-1"`, `"1-2"`, `"1-3"`
- Nested child context `"1-2"`: `"1-2-1"`, `"1-2-2"`

### Per-context replay state

A global `executionMode` doesn't work for child contexts — a child may be replaying while the parent is already executing. Each `DurableContext` tracks its own replay state via an `isReplaying` field, initialized by checking `ExecutionManager.hasOperationsForContext(contextId)`.

### Thread model

Child context user code runs in a separate thread (same pattern as `StepOperation`):
- `registerActiveThread` before the executor runs (on parent thread)
- `setCurrentContext` inside the executor thread
- `deregisterActiveThread` in the finally block
- `SuspendExecutionException` caught in finally (suspension already signaled)

### Large result handling

Results < 256KB are checkpointed directly. Results ≥ 256KB trigger the `ReplayChildren` flow:
- SUCCEED checkpoint with empty payload + `ContextOptions { replayChildren: true }`
- On replay, child context re-executes; inner operations replay from cache
- No new SUCCEED checkpoint during reconstruction

### Replay behavior

| Cached status | Behavior |
|---------------|----------|
| SUCCEEDED | Return cached result |
| SUCCEEDED + `replayChildren=true` | Re-execute child to reconstruct large result |
| FAILED | Re-throw cached error |
| STARTED | Re-execute (interrupted mid-flight) |

## Alternatives Considered

### Flatten child operations into root checkpoint log
**Rejected:** Breaks operation ID uniqueness. A CONTEXT op with ID `"1"` and an inner STEP with ID `"1"` (different `parentId`) would trigger `InvalidParameterValueException` from the backend.

### Global replay state with context tracking
**Rejected:** Adds complexity to `ExecutionManager` for something that's naturally per-context. The TypeScript SDK uses per-entity replay state for the same reason.

## Consequences

**Positive:**
- Aligns with TypeScript and Python SDK implementations
- Enables fan-out/fan-in, parallel branches, hierarchical workflows
- Clean separation: each child context is self-contained
- Nested child contexts chain naturally via ID prefixing

**Negative:**
- More threads to coordinate
- Per-context replay state adds complexity vs. global mode

**Deferred:**
- Orphan detection in `CheckpointBatcher`
- `summaryGenerator` for large-result observability
- Higher-level `map`/`parallel` combinators (different `OperationSubType` values, same `CONTEXT` operation type)
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
package com.amazonaws.lambda.durable.examples;

import com.amazonaws.lambda.durable.DurableContext;
import com.amazonaws.lambda.durable.DurableFuture;
import com.amazonaws.lambda.durable.DurableHandler;
import java.time.Duration;

/**
* Example demonstrating child context workflows with the Durable Execution SDK.
*
* <p>This handler runs three concurrent child contexts using {@code runInChildContextAsync}:
*
* <ol>
* <li><b>Order validation</b> — performs a step then suspends via {@code wait()} before completing
* <li><b>Inventory check</b> — performs a step then suspends via {@code wait()} before completing
* <li><b>Shipping estimate</b> — nests another child context inside it to demonstrate hierarchical contexts
* </ol>
*
* <p>All three child contexts run concurrently. Results are collected with {@link DurableFuture#allOf} and combined
* into a summary string.
*/
public class ChildContextExample extends DurableHandler<GreetingRequest, String> {

@Override
public String handleRequest(GreetingRequest input, DurableContext context) {
var name = input.getName();
context.getLogger().info("Starting child context workflow for {}", name);

// Child context 1: Order validation — step + wait + step
var orderFuture = context.runInChildContextAsync("order-validation", String.class, child -> {
var prepared = child.step("prepare-order", String.class, () -> "Order for " + name);
context.getLogger().info("Order prepared, waiting for validation");

child.wait("validation-delay", Duration.ofSeconds(5));

return child.step("validate-order", String.class, () -> prepared + " [validated]");
});

// Child context 2: Inventory check — step + wait + step
var inventoryFuture = context.runInChildContextAsync("inventory-check", String.class, child -> {
var stock = child.step("check-stock", String.class, () -> "Stock available for " + name);
context.getLogger().info("Stock checked, waiting for confirmation");

child.wait("confirmation-delay", Duration.ofSeconds(3));

return child.step("confirm-inventory", String.class, () -> stock + " [confirmed]");
});

// Child context 3: Shipping estimate — nests a child context inside it
var shippingFuture = context.runInChildContextAsync("shipping-estimate", String.class, child -> {
var baseRate = child.step("calculate-base-rate", String.class, () -> "Base rate for " + name);

// Nested child context: calculate regional adjustment
var adjustment = child.runInChildContext(
"regional-adjustment",
String.class,
nested -> nested.step("lookup-region", String.class, () -> baseRate + " + regional adjustment"));

return child.step("finalize-shipping", String.class, () -> adjustment + " [shipping ready]");
});

// Collect all results using allOf
context.getLogger().info("Waiting for all child contexts to complete");
var results = DurableFuture.allOf(orderFuture, inventoryFuture, shippingFuture);

// Combine into summary
var summary = String.join(" | ", results);
context.getLogger().info("All child contexts complete: {}", summary);

return summary;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
package com.amazonaws.lambda.durable.examples;

import static org.junit.jupiter.api.Assertions.*;

import com.amazonaws.lambda.durable.model.ExecutionStatus;
import com.amazonaws.lambda.durable.testing.LocalDurableTestRunner;
import org.junit.jupiter.api.Test;

class ChildContextExampleTest {

@Test
void testChildContextExampleRunsToCompletion() {
var handler = new ChildContextExample();
var runner = LocalDurableTestRunner.create(GreetingRequest.class, handler);

var input = new GreetingRequest("Alice");
var result = runner.runUntilComplete(input);

assertEquals(ExecutionStatus.SUCCEEDED, result.getStatus());
assertEquals(
"Order for Alice [validated] | Stock available for Alice [confirmed] | Base rate for Alice + regional adjustment [shipping ready]",
result.getResult(String.class));
}

@Test
void testChildContextExampleSuspendsOnFirstRun() {
var handler = new ChildContextExample();
var runner = LocalDurableTestRunner.create(GreetingRequest.class, handler);

var input = new GreetingRequest("Bob");

// First run should suspend due to wait operations inside child contexts
var result = runner.run(input);
assertEquals(ExecutionStatus.PENDING, result.getStatus());
}

@Test
void testChildContextExampleReplay() {
var handler = new ChildContextExample();
var runner = LocalDurableTestRunner.create(GreetingRequest.class, handler);

var input = new GreetingRequest("Alice");

// First full execution
var result1 = runner.runUntilComplete(input);
assertEquals(ExecutionStatus.SUCCEEDED, result1.getStatus());

// Replay — should return cached results
var result2 = runner.run(input);
assertEquals(ExecutionStatus.SUCCEEDED, result2.getStatus());
assertEquals(result1.getResult(String.class), result2.getResult(String.class));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,22 @@ void testCallbackExampleWithTimeout() {
assertEquals(OperationStatus.TIMED_OUT, approvalOp.getStatus());
}

@Test
void testChildContextExample() {
var runner = CloudDurableTestRunner.create(arn("child-context-example"), GreetingRequest.class, String.class);
var result = runner.run(new GreetingRequest("Alice"));

assertEquals(ExecutionStatus.SUCCEEDED, result.getStatus());
assertEquals(
"Order for Alice [validated] | Stock available for Alice [confirmed] | Base rate for Alice + regional adjustment [shipping ready]",
result.getResult(String.class));

// Verify child context operations were tracked
assertNotNull(runner.getOperation("order-validation"));
assertNotNull(runner.getOperation("inventory-check"));
assertNotNull(runner.getOperation("shipping-estimate"));
}

@Test
void testManyAsyncStepsExample() {
var runner = CloudDurableTestRunner.create(
Expand Down
33 changes: 33 additions & 0 deletions examples/template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,31 @@ Resources:
DockerContext: ../
DockerTag: durable-examples

ChildContextExampleFunction:
Type: AWS::Serverless::Function
Properties:
PackageType: Image
FunctionName: !Join
- ''
- - 'child-context-example'
- !Ref FunctionNameSuffix
ImageConfig:
Command: ["com.amazonaws.lambda.durable.examples.ChildContextExample::handleRequest"]
DurableConfig:
ExecutionTimeout: 300
RetentionPeriodInDays: 7
Policies:
- Statement:
- Effect: Allow
Action:
- lambda:CheckpointDurableExecutions
- lambda:GetDurableExecutionState
Resource: !Sub "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:child-context-example${FunctionNameSuffix}"
Metadata:
Dockerfile: !Ref DockerFile
DockerContext: ../
DockerTag: durable-examples

Outputs:
SimpleStepExampleFunction:
Description: Simple Step Example Function ARN
Expand Down Expand Up @@ -454,3 +479,11 @@ Outputs:
ManyAsyncStepsExampleFunctionName:
Description: Many Async Steps Example Function Name
Value: !Ref ManyAsyncStepsExampleFunction

ChildContextExampleFunction:
Description: Child Context Example Function ARN
Value: !GetAtt ChildContextExampleFunction.Arn

ChildContextExampleFunctionName:
Description: Child Context Example Function Name
Value: !Ref ChildContextExampleFunction
Loading
Loading