Skip to content

Commit f40431b

Browse files
authored
Feat/child context (#100)
1 parent a431d03 commit f40431b

File tree

25 files changed

+1631
-55
lines changed

25 files changed

+1631
-55
lines changed

README.md

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ Your durable function extends `DurableHandler<I, O>` and implements `handleReque
2323
- `ctx.createCallback()` – Wait for external events (approvals, webhooks)
2424
- `ctx.invoke()` – Invoke another Lambda function and wait for the result
2525
- `ctx.invokeAsync()` – Start a concurrent Lambda function invocation
26+
- `ctx.runInChildContext()` – Run an isolated child context with its own checkpoint log
27+
- `ctx.runInChildContextAsync()` – Start a concurrent child context
2628

2729
## Quick Start
2830

@@ -189,6 +191,30 @@ var result = ctx.invoke("invoke-function",
189191

190192
```
191193

194+
### runInChildContext() – Isolated Execution Contexts
195+
196+
Child contexts run an isolated stream of work with their own operation counter and checkpoint log. They support the full range of durable operations — `step`, `wait`, `invoke`, `createCallback`, and nested child contexts.
197+
198+
```java
199+
// Sync: blocks until the child context completes
200+
var result = ctx.runInChildContext("validate-order", String.class, child -> {
201+
var data = child.step("fetch", String.class, () -> fetchData());
202+
child.wait(Duration.ofMinutes(5));
203+
return child.step("validate", String.class, () -> validate(data));
204+
});
205+
206+
// Async: returns a DurableFuture for concurrent execution
207+
var futureA = ctx.runInChildContextAsync("branch-a", String.class, child -> {
208+
return child.step("work-a", String.class, () -> doWorkA());
209+
});
210+
var futureB = ctx.runInChildContextAsync("branch-b", String.class, child -> {
211+
return child.step("work-b", String.class, () -> doWorkB());
212+
});
213+
214+
// Wait for all child contexts to complete
215+
var results = DurableFuture.allOf(futureA, futureB);
216+
```
217+
192218
## Step Configuration
193219

194220
Configure step behavior with `StepConfig`:
@@ -396,9 +422,10 @@ DurableExecutionException - General durable exception
396422
│ ├── InvokeFailedException - Chained invocation failed. Handle the error or propagate failure.
397423
│ ├── InvokeTimedoutException - Chained invocation timed out. Handle the error or propagate failure.
398424
│ └── InvokeStoppedException - Chained invocation stopped. Handle the error or propagate failure.
399-
└── CallbackException - General callback exception
400-
├── CallbackFailedException - External system sent an error response to the callback. Handle the error or propagate failure
401-
└── CallbackTimeoutException - Callback exceeded its timeout duration. Handle the error or propagate the failure
425+
├── CallbackException - General callback exception
426+
│ ├── CallbackFailedException - External system sent an error response to the callback. Handle the error or propagate failure
427+
│ └── CallbackTimeoutException - Callback exceeded its timeout duration. Handle the error or propagate the failure
428+
└── ChildContextFailedException - Child context failed and the original exception could not be reconstructed
402429
```
403430

404431
```java
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# ADR-004: Child Context Execution (`runInChildContext`)
2+
3+
**Status:** Accepted
4+
**Date:** 2026-02-16
5+
6+
## Context
7+
8+
The TypeScript and Python durable execution SDKs support child contexts via `OperationType.CONTEXT`, enabling isolated sub-workflows with independent operation counters and checkpoint logs. The Java SDK needs the same capability to support fan-out/fan-in, parallel processing branches, and hierarchical workflow composition.
9+
10+
```java
11+
var futureA = ctx.runInChildContextAsync("branch-a", String.class, child -> {
12+
child.step("validate", Void.class, () -> validate(order));
13+
child.wait(Duration.ofMinutes(5));
14+
return child.step("charge", String.class, () -> charge(order));
15+
});
16+
var futureB = ctx.runInChildContextAsync("branch-b", String.class, child -> { ... });
17+
var results = DurableFuture.allOf(futureA, futureB);
18+
```
19+
20+
## Decision
21+
22+
### Child context as a CONTEXT operation
23+
24+
A child context is a `CONTEXT` operation in the checkpoint log with a three-phase lifecycle:
25+
26+
1. **START** (fire-and-forget) — marks the child context as in-progress
27+
2. Inner operations checkpoint with `parentId` set to the child context's operation ID
28+
3. **SUCCEED** or **FAIL** (blocking) — finalizes the child context
29+
30+
```
31+
Op ID | Parent ID | Type | Action | Payload
32+
------|-----------|---------|---------|--------
33+
3 | null | CONTEXT | START | —
34+
3-1 | 3 | STEP | START | —
35+
3-1 | 3 | STEP | SUCCEED | "result"
36+
3 | null | CONTEXT | SUCCEED | "final result"
37+
```
38+
39+
### Operation ID prefixing
40+
41+
Inner operation IDs are prefixed with the parent context's operation ID using `-` as separator (e.g., `"3-1"`, `"3-2"`). This matches the JavaScript SDK's `stepPrefix` convention and ensures global uniqueness — the backend validates type consistency by operation ID alone.
42+
43+
- Root context: `"1"`, `"2"`, `"3"`
44+
- Child context `"1"`: `"1-1"`, `"1-2"`, `"1-3"`
45+
- Nested child context `"1-2"`: `"1-2-1"`, `"1-2-2"`
46+
47+
### Per-context replay state
48+
49+
A global `executionMode` doesn't work for child contexts — a child may be replaying while the parent is already executing. Each `DurableContext` tracks its own replay state via an `isReplaying` field, initialized by checking `ExecutionManager.hasOperationsForContext(contextId)`.
50+
51+
### Thread model
52+
53+
Child context user code runs in a separate thread (same pattern as `StepOperation`):
54+
- `registerActiveThread` before the executor runs (on parent thread)
55+
- `setCurrentContext` inside the executor thread
56+
- `deregisterActiveThread` in the finally block
57+
- `SuspendExecutionException` caught in finally (suspension already signaled)
58+
59+
### Large result handling
60+
61+
Results < 256KB are checkpointed directly. Results ≥ 256KB trigger the `ReplayChildren` flow:
62+
- SUCCEED checkpoint with empty payload + `ContextOptions { replayChildren: true }`
63+
- On replay, child context re-executes; inner operations replay from cache
64+
- No new SUCCEED checkpoint during reconstruction
65+
66+
### Replay behavior
67+
68+
| Cached status | Behavior |
69+
|---------------|----------|
70+
| SUCCEEDED | Return cached result |
71+
| SUCCEEDED + `replayChildren=true` | Re-execute child to reconstruct large result |
72+
| FAILED | Re-throw cached error |
73+
| STARTED | Re-execute (interrupted mid-flight) |
74+
75+
## Alternatives Considered
76+
77+
### Flatten child operations into root checkpoint log
78+
**Rejected:** Breaks operation ID uniqueness. A CONTEXT op with ID `"1"` and an inner STEP with ID `"1"` (different `parentId`) would trigger `InvalidParameterValueException` from the backend.
79+
80+
### Global replay state with context tracking
81+
**Rejected:** Adds complexity to `ExecutionManager` for something that's naturally per-context. The TypeScript SDK uses per-entity replay state for the same reason.
82+
83+
## Consequences
84+
85+
**Positive:**
86+
- Aligns with TypeScript and Python SDK implementations
87+
- Enables fan-out/fan-in, parallel branches, hierarchical workflows
88+
- Clean separation: each child context is self-contained
89+
- Nested child contexts chain naturally via ID prefixing
90+
91+
**Negative:**
92+
- More threads to coordinate
93+
- Per-context replay state adds complexity vs. global mode
94+
95+
**Deferred:**
96+
- Orphan detection in `CheckpointBatcher`
97+
- `summaryGenerator` for large-result observability
98+
- Higher-level `map`/`parallel` combinators (different `OperationSubType` values, same `CONTEXT` operation type)
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
// SPDX-License-Identifier: Apache-2.0
3+
package com.amazonaws.lambda.durable.examples;
4+
5+
import com.amazonaws.lambda.durable.DurableContext;
6+
import com.amazonaws.lambda.durable.DurableFuture;
7+
import com.amazonaws.lambda.durable.DurableHandler;
8+
import java.time.Duration;
9+
10+
/**
11+
* Example demonstrating child context workflows with the Durable Execution SDK.
12+
*
13+
* <p>This handler runs three concurrent child contexts using {@code runInChildContextAsync}:
14+
*
15+
* <ol>
16+
* <li><b>Order validation</b> — performs a step then suspends via {@code wait()} before completing
17+
* <li><b>Inventory check</b> — performs a step then suspends via {@code wait()} before completing
18+
* <li><b>Shipping estimate</b> — nests another child context inside it to demonstrate hierarchical contexts
19+
* </ol>
20+
*
21+
* <p>All three child contexts run concurrently. Results are collected with {@link DurableFuture#allOf} and combined
22+
* into a summary string.
23+
*/
24+
public class ChildContextExample extends DurableHandler<GreetingRequest, String> {
25+
26+
@Override
27+
public String handleRequest(GreetingRequest input, DurableContext context) {
28+
var name = input.getName();
29+
context.getLogger().info("Starting child context workflow for {}", name);
30+
31+
// Child context 1: Order validation — step + wait + step
32+
var orderFuture = context.runInChildContextAsync("order-validation", String.class, child -> {
33+
var prepared = child.step("prepare-order", String.class, () -> "Order for " + name);
34+
context.getLogger().info("Order prepared, waiting for validation");
35+
36+
child.wait("validation-delay", Duration.ofSeconds(5));
37+
38+
return child.step("validate-order", String.class, () -> prepared + " [validated]");
39+
});
40+
41+
// Child context 2: Inventory check — step + wait + step
42+
var inventoryFuture = context.runInChildContextAsync("inventory-check", String.class, child -> {
43+
var stock = child.step("check-stock", String.class, () -> "Stock available for " + name);
44+
context.getLogger().info("Stock checked, waiting for confirmation");
45+
46+
child.wait("confirmation-delay", Duration.ofSeconds(3));
47+
48+
return child.step("confirm-inventory", String.class, () -> stock + " [confirmed]");
49+
});
50+
51+
// Child context 3: Shipping estimate — nests a child context inside it
52+
var shippingFuture = context.runInChildContextAsync("shipping-estimate", String.class, child -> {
53+
var baseRate = child.step("calculate-base-rate", String.class, () -> "Base rate for " + name);
54+
55+
// Nested child context: calculate regional adjustment
56+
var adjustment = child.runInChildContext(
57+
"regional-adjustment",
58+
String.class,
59+
nested -> nested.step("lookup-region", String.class, () -> baseRate + " + regional adjustment"));
60+
61+
return child.step("finalize-shipping", String.class, () -> adjustment + " [shipping ready]");
62+
});
63+
64+
// Collect all results using allOf
65+
context.getLogger().info("Waiting for all child contexts to complete");
66+
var results = DurableFuture.allOf(orderFuture, inventoryFuture, shippingFuture);
67+
68+
// Combine into summary
69+
var summary = String.join(" | ", results);
70+
context.getLogger().info("All child contexts complete: {}", summary);
71+
72+
return summary;
73+
}
74+
}
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
// SPDX-License-Identifier: Apache-2.0
3+
package com.amazonaws.lambda.durable.examples;
4+
5+
import static org.junit.jupiter.api.Assertions.*;
6+
7+
import com.amazonaws.lambda.durable.model.ExecutionStatus;
8+
import com.amazonaws.lambda.durable.testing.LocalDurableTestRunner;
9+
import org.junit.jupiter.api.Test;
10+
11+
class ChildContextExampleTest {
12+
13+
@Test
14+
void testChildContextExampleRunsToCompletion() {
15+
var handler = new ChildContextExample();
16+
var runner = LocalDurableTestRunner.create(GreetingRequest.class, handler);
17+
18+
var input = new GreetingRequest("Alice");
19+
var result = runner.runUntilComplete(input);
20+
21+
assertEquals(ExecutionStatus.SUCCEEDED, result.getStatus());
22+
assertEquals(
23+
"Order for Alice [validated] | Stock available for Alice [confirmed] | Base rate for Alice + regional adjustment [shipping ready]",
24+
result.getResult(String.class));
25+
}
26+
27+
@Test
28+
void testChildContextExampleSuspendsOnFirstRun() {
29+
var handler = new ChildContextExample();
30+
var runner = LocalDurableTestRunner.create(GreetingRequest.class, handler);
31+
32+
var input = new GreetingRequest("Bob");
33+
34+
// First run should suspend due to wait operations inside child contexts
35+
var result = runner.run(input);
36+
assertEquals(ExecutionStatus.PENDING, result.getStatus());
37+
}
38+
39+
@Test
40+
void testChildContextExampleReplay() {
41+
var handler = new ChildContextExample();
42+
var runner = LocalDurableTestRunner.create(GreetingRequest.class, handler);
43+
44+
var input = new GreetingRequest("Alice");
45+
46+
// First full execution
47+
var result1 = runner.runUntilComplete(input);
48+
assertEquals(ExecutionStatus.SUCCEEDED, result1.getStatus());
49+
50+
// Replay — should return cached results
51+
var result2 = runner.run(input);
52+
assertEquals(ExecutionStatus.SUCCEEDED, result2.getStatus());
53+
assertEquals(result1.getResult(String.class), result2.getResult(String.class));
54+
}
55+
}

examples/src/test/java/com/amazonaws/lambda/durable/examples/CloudBasedIntegrationTest.java

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,22 @@ void testCallbackExampleWithTimeout() {
353353
assertEquals(OperationStatus.TIMED_OUT, approvalOp.getStatus());
354354
}
355355

356+
@Test
357+
void testChildContextExample() {
358+
var runner = CloudDurableTestRunner.create(arn("child-context-example"), GreetingRequest.class, String.class);
359+
var result = runner.run(new GreetingRequest("Alice"));
360+
361+
assertEquals(ExecutionStatus.SUCCEEDED, result.getStatus());
362+
assertEquals(
363+
"Order for Alice [validated] | Stock available for Alice [confirmed] | Base rate for Alice + regional adjustment [shipping ready]",
364+
result.getResult(String.class));
365+
366+
// Verify child context operations were tracked
367+
assertNotNull(runner.getOperation("order-validation"));
368+
assertNotNull(runner.getOperation("inventory-check"));
369+
assertNotNull(runner.getOperation("shipping-estimate"));
370+
}
371+
356372
@Test
357373
void testManyAsyncStepsExample() {
358374
var runner = CloudDurableTestRunner.create(

examples/template.yaml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,31 @@ Resources:
350350
DockerContext: ../
351351
DockerTag: durable-examples
352352

353+
ChildContextExampleFunction:
354+
Type: AWS::Serverless::Function
355+
Properties:
356+
PackageType: Image
357+
FunctionName: !Join
358+
- ''
359+
- - 'child-context-example'
360+
- !Ref FunctionNameSuffix
361+
ImageConfig:
362+
Command: ["com.amazonaws.lambda.durable.examples.ChildContextExample::handleRequest"]
363+
DurableConfig:
364+
ExecutionTimeout: 300
365+
RetentionPeriodInDays: 7
366+
Policies:
367+
- Statement:
368+
- Effect: Allow
369+
Action:
370+
- lambda:CheckpointDurableExecutions
371+
- lambda:GetDurableExecutionState
372+
Resource: !Sub "arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:child-context-example${FunctionNameSuffix}"
373+
Metadata:
374+
Dockerfile: !Ref DockerFile
375+
DockerContext: ../
376+
DockerTag: durable-examples
377+
353378
Outputs:
354379
SimpleStepExampleFunction:
355380
Description: Simple Step Example Function ARN
@@ -454,3 +479,11 @@ Outputs:
454479
ManyAsyncStepsExampleFunctionName:
455480
Description: Many Async Steps Example Function Name
456481
Value: !Ref ManyAsyncStepsExampleFunction
482+
483+
ChildContextExampleFunction:
484+
Description: Child Context Example Function ARN
485+
Value: !GetAtt ChildContextExampleFunction.Arn
486+
487+
ChildContextExampleFunctionName:
488+
Description: Child Context Example Function Name
489+
Value: !Ref ChildContextExampleFunction

0 commit comments

Comments
 (0)