Skip to content

Commit 5c6f207

Browse files
committed
docs: revise key concepts, add section indices
- Rewrite key-concepts.md with richer content from service docs - Add DurableContext section with multi-language code examples - Add Checkpoints section with 5-step sequence - Expand Replay section with checkpoint log sequence and determinism note - Add replay walkthrough with language-neutral step-by-step trace - Fix operations list with links to SDK reference pages and add wait-for-condition - Move Python-specific content (two SDKs, decorators) to sdk-reference/languages/ - Add Language Guides section under SDK Reference (TypeScript, Python, Java) - Add toc.permalink = true for heading anchor links - Revise getting-started/index.md as section landing page - Add verified multi-language code examples for durable-context and execution-model closes #103
1 parent 9e4bd07 commit 5c6f207

18 files changed

Lines changed: 420 additions & 93 deletions

File tree

docs/getting-started/index.md

Lines changed: 12 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,19 @@
11
# Getting Started
22

3-
## Installation
3+
This section will help you get started building durable functions in
4+
[AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html).
45

5-
Install the SDK using pip:
6+
Please see the [SDK Reference](../sdk-reference/) section for in-depth detail of the
7+
concepts introduced here.
68

7-
```console
8-
pip install aws-durable-execution-sdk-python
9-
```
9+
## Prerequisites
1010

11-
## Quick example
11+
- An AWS account with permissions to create and execute AWS Lambda functions
12+
- Familiarity with writing AWS Lambda functions
1213

13-
Here's a simple durable function that processes an order:
14+
## In this section
1415

15-
=== "TypeScript"
16-
17-
``` typescript
18-
--8<-- "examples/typescript/index/quick-example.ts"
19-
```
20-
21-
=== "Python"
22-
23-
``` python
24-
--8<-- "examples/python/index/quick-example.py"
25-
```
26-
27-
=== "Java"
28-
29-
``` java
30-
--8<-- "examples/java/index/quick-example.java"
31-
```
32-
33-
Each `context.step()` call is checkpointed automatically. If Lambda recycles your execution environment, the function resumes from the last completed step.
34-
35-
## Next steps
36-
37-
- [Key Concepts](key-concepts.md) - Understand the mental model behind durable execution
38-
- [Quick Start](quick-start.md) - Build and test your first durable function
16+
- [Key Concepts](key-concepts.md) Understand durable execution, checkpoints, replay, and
17+
the DurableContext before writing code
18+
- [Quick Start](quick-start.md) Install the SDK, write your first durable function, and
19+
test it locally

docs/getting-started/key-concepts.md

Lines changed: 140 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -2,94 +2,180 @@
22

33
## Durable execution
44

5-
A durable execution represents the complete lifecycle of a Lambda durable function. The SDK uses a checkpoint and replay mechanism to track progress, suspend execution, and recover from failures. A single execution may span multiple Lambda invocations.
5+
A durable execution is the complete lifecycle of an AWS Lambda durable function. It uses
6+
a checkpoint and replay mechanism to track progress, suspend execution, and recover from
7+
failures. When functions resume after suspension or interruptions, previously completed
8+
checkpoints replay and the function continues execution.
9+
10+
The execution lifecycle could include multiple invocations of the Lambda function to
11+
complete, particularly after suspensions or failure recovery. With these replays the
12+
execution can run for extended periods (up to one year) while maintaining reliable
13+
progress despite interruptions.
14+
15+
### Timeouts
16+
17+
The
18+
[execution timeout](https://docs.aws.amazon.com/lambda/latest/api/API_DurableConfig.html#lambda-Type-DurableConfig-ExecutionTimeout)
19+
and Lambda function
20+
[Timeout](https://docs.aws.amazon.com/lambda/latest/api/API_CreateFunction.html#lambda-CreateFunction-request-Timeout)
21+
are different settings. The Lambda function timeout controls how long each individual
22+
invocation can run (maximum 15 minutes). The execution timeout controls the total
23+
elapsed time for the entire durable execution (maximum 1 year).
624

725
## Durable functions
826

9-
A durable function is a Lambda function decorated with `@durable_execution` that can be checkpointed and resumed. The function receives a `DurableContext` that provides methods for durable operations.
27+
A durable function is a Lambda function configured with the
28+
[`DurableConfig`](https://docs.aws.amazon.com/lambda/latest/dg/durable-configuration.html)
29+
object at creation time. Lambda will then apply the checkpoint and replay mechanism to
30+
the function's execution to make it durable at invocation time.
1031

11-
## Operations
32+
## DurableContext
33+
34+
`DurableContext` is the context object your durable function receives instead of the
35+
standard Lambda `Context`. It exposes all durable operations and provides methods for
36+
creating checkpoints, managing execution flow, and coordinating with external systems.
1237

13-
Operations are units of work in a durable execution. Each operation type serves a specific purpose:
38+
Your durable function receives a `DurableContext` instead of the default Lambda context:
1439

15-
- **Steps** - Execute code and checkpoint the result with retry support
16-
- **Waits** - Pause execution for a specified duration without blocking Lambda
17-
- **Callbacks** - Wait for external systems to respond with results
18-
- **Invoke** - Call other durable functions to compose complex workflows
19-
- **Child contexts** - Isolate nested workflows for better organization
20-
- **Parallel** - Execute multiple operations concurrently with completion criteria
21-
- **Map** - Process collections in parallel with batching and failure tolerance
40+
=== "TypeScript"
41+
42+
```typescript
43+
--8<-- "examples/typescript/getting-started/durable-context.ts"
44+
```
45+
46+
=== "Python"
47+
48+
```python
49+
--8<-- "examples/python/getting-started/durable-context.py"
50+
```
51+
52+
=== "Java"
53+
54+
```java
55+
--8<-- "examples/java/getting-started/durable-context.java"
56+
```
57+
58+
## Operations
59+
60+
Operations are units of work in a durable execution. Each operation type serves a
61+
specific purpose:
62+
63+
- [Steps](../sdk-reference/operations/step.md) Execute business logic with automatic
64+
checkpointing and configurable retry
65+
- [Waits](../sdk-reference/operations/wait.md) Suspend execution for a duration without
66+
consuming compute resources
67+
- [Callbacks](../sdk-reference/operations/callback.md) Suspend execution and wait for an
68+
external system to submit a result
69+
- [Invoke](../sdk-reference/operations/invoke.md) Invoke another Lambda function and
70+
checkpoint the result
71+
- [Parallel](../sdk-reference/operations/parallel.md) Execute multiple independent
72+
operations concurrently
73+
- [Map](../sdk-reference/operations/map.md) Execute an operation on each item in an
74+
array concurrently with optional concurrency control
75+
- [Child context](../sdk-reference/operations/child-context.md) Group operations into an
76+
isolated context for sub-workflow organization and concurrent determinism
77+
- [Wait for condition](../sdk-reference/operations/wait-for-condition.md) Poll for a
78+
condition with automatic checkpointing between attempts
2279

2380
## Checkpoints
2481

25-
Checkpoints are saved states of execution that allow resumption. When your function calls `context.step()` or other operations, the SDK creates a checkpoint and sends it to AWS. If Lambda recycles your environment or your function waits for an external event, execution can resume from the last checkpoint.
82+
A checkpoint is a saved record of a completed durable operation: its type, name, inputs,
83+
result, and timestamp. The SDK creates checkpoints automatically as your function
84+
executes operations. Together, the checkpoints form a log that Lambda uses to resume
85+
execution after a suspension or interruption.
86+
87+
When your code calls a durable operation, the SDK follows this sequence:
88+
89+
1. **Check for an existing checkpoint** if this operation already completed in a
90+
previous invocation, the SDK returns the stored result without re-executing
91+
2. **Execute the operation** if no checkpoint exists, the SDK runs the operation code
92+
3. **Serialize the result** the SDK serializes the result for storage
93+
4. **Persist the checkpoint** the SDK calls the Lambda checkpoint API to durably store
94+
the result before continuing
95+
5. **Return the result** execution continues to the next operation
96+
97+
Once the SDK persists a checkpoint, that operation's result is safe. If your function is
98+
interrupted at any point, the SDK can replay up to the last persisted checkpoint on the
99+
next invocation.
26100

27101
## Replay
28102

29-
When your function resumes, completed operations don't re-execute. Instead, they return their checkpointed results instantly. This means your function code runs multiple times, but side effects only happen once per operation.
103+
Lambda keeps a running log of all durable operations as your function executes. When
104+
your function needs to pause or encounters an interruption, Lambda saves this checkpoint
105+
log and stops the execution. When it's time to resume, Lambda invokes your function
106+
again from the beginning and replays the checkpoint log:
107+
108+
1. **Load checkpoint log** the SDK retrieves the checkpoint log for the execution from
109+
Lambda
110+
2. **Run from beginning** your handler runs from the start, not from where it paused
111+
3. **Skip completed operations** as your code calls durable operations, the SDK checks
112+
each against the checkpoint log and returns stored results without re-executing the
113+
operation code
114+
4. **Resume at interruption point** when the SDK reaches an operation without a
115+
checkpoint, it executes normally and creates new checkpoints from that point
116+
forward
117+
118+
The SDK enforces determinism by validating that operation names and types match the
119+
checkpoint log during replay. Your orchestration code must make the same sequence of
120+
durable operation calls on every invocation.
121+
122+
## Determinism
30123

31-
Because your code runs again on replay, it must be **deterministic** — avoid random values, timestamps, or external API calls outside of steps, as these can produce different values on replay.
124+
Because your code runs again on replay, it must be **deterministic**. Deterministic
125+
means that the code always produces the same results given the same inputs. Given the
126+
same inputs and checkpoint log, your function must make the same sequence of durable
127+
operation calls. Avoid operations with side effects (like generating random numbers or
128+
getting the current time) outside of steps, as these can produce different values during
129+
replay and cause non-deterministic behavior.
32130

33-
## How replay works in practice
131+
### Rules for deterministic durable operations
132+
133+
1. All durable operations in a context must start sequentially.
134+
2. To run durable operations concurrently, wrap each set of operations in its own child
135+
context and then run the child contexts concurrently.
136+
3. Only use the child `DurableContext` in the child context scope. Do not use any
137+
parent's context in a child context scope.
138+
139+
## Replay Walkthrough
34140

35141
Let's trace through a simple workflow:
36142

37143
=== "TypeScript"
38144

39-
``` typescript
145+
```typescript
40146
--8<-- "examples/typescript/getting-started/execution-model.ts"
41147
```
42148

43149
=== "Python"
44150

45-
``` python
151+
```python
46152
--8<-- "examples/python/getting-started/execution-model.py"
47153
```
48154

49155
=== "Java"
50156

51-
``` java
157+
```java
52158
--8<-- "examples/java/getting-started/execution-model.java"
53159
```
54160

55161
**First invocation (t=0s):**
56162

57-
1. Lambda invokes your function
58-
2. `fetch_data` executes and calls an external API
59-
3. Result is checkpointed to AWS
60-
4. `context.wait(Duration.from_seconds(30))` is reached
61-
5. Function returns, Lambda can recycle the environment
163+
1. You start a durable execution by invoking a durable function
164+
2. The durable functions service invokes your durable function handler
165+
3. The fetch step runs and calls an external API
166+
4. The SDK checkpoints the result of the fetch step
167+
5. Execution reaches `context.wait()` and the SDK checkpoints the wait operation
168+
6. The SDK terminates the current Lambda invocation, but the durable execution is still
169+
active
62170

63171
**Second invocation (t=30s):**
64172

65-
1. Lambda invokes your function again
66-
2. Function code runs from the beginning
67-
3. `fetch_data` returns the checkpointed result instantly (no API call)
68-
4. `context.wait()` is already complete, execution continues
69-
5. `process_data` executes for the first time
70-
71-
## The two SDKs
72-
73-
### Execution SDK (`aws-durable-execution-sdk-python`)
74-
75-
Runs in your Lambda functions. Provides `DurableContext`, operations, decorators, and serialization. Install in your Lambda deployment package.
76-
77-
```console
78-
pip install aws-durable-execution-sdk-python
79-
```
80-
81-
### Testing SDK (`aws-durable-execution-sdk-python-testing`)
82-
83-
A separate SDK for testing your durable functions locally without AWS. Provides `DurableFunctionTestRunner`, pytest integration, and result inspection. Install in your development environment only.
84-
85-
```console
86-
pip install aws-durable-execution-sdk-python-testing
87-
```
88-
89-
## Decorators
90-
91-
The SDK provides decorators to mark functions as durable:
92-
93-
- `@durable_execution` - Marks your Lambda handler as a durable function
94-
- `@durable_step` - Marks a function that can be used with `context.step()`
95-
- `@durable_with_child_context` - Marks a function that receives a child context
173+
1. The durable functions service invokes your function again
174+
2. The function runs from the ginning
175+
3. The fetch step returns its checkpointed result instantly, it does not re-execute the
176+
API call
177+
4. The wait has already elapsed, so execution continues
178+
5. The process step runs for the first time
179+
6. The SDK checkpoints the result of the process step
180+
7. The function returns naturally and the invocation ends
181+
8. The durable execution ends

docs/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@ Lambda. Your functions can pause execution, wait for external events, retry fail
1414
operations, and resume exactly where they left off, even if Lambda recycles your
1515
execution environment.
1616

17+
If you are new, [Get started](getting-started/) over here.
18+
19+
For detailed programming language reference, see [SDK Reference](sdk-reference/).
20+
1721
## Key features
1822

1923
- **Automatic checkpointing** Workflow state is saved automatically after each operation

docs/patterns/index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Patterns
2+
3+
Reusable patterns and guidance for common durable function use cases.
4+
5+
## In this section
6+
7+
- [Best Practices](best-practices.md) Function design, timeout configuration, naming
8+
conventions, and common mistakes to avoid

docs/sdk-reference/index.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# SDK Reference
2+
3+
The SDK Reference covers everything you need to build, configure, and operate durable
4+
functions.
5+
6+
## Operations
7+
8+
The core building blocks for constructing durable workflows:
9+
10+
- [Step](operations/step.md) Execute and checkpoint a unit of work
11+
- [Wait](operations/wait.md) Pause execution for a duration
12+
- [Wait for Condition](operations/wait-for-condition.md) Pause until an external
13+
condition is met
14+
- [Callback](operations/callback.md) Resume execution via an external signal
15+
- [Invoke](operations/invoke.md) Invoke another durable function
16+
- [Parallel](operations/parallel.md) Execute multiple operations concurrently
17+
- [Map](operations/map.md) Apply an operation across a collection
18+
- [Child Context](operations/child-context.md) Scope a sub-workflow within a parent
19+
20+
## Error Handling
21+
22+
- [Errors](error-handling/errors.md) Exception types and error response formats
23+
- [Retries](error-handling/retries.md) Configuring retry behavior for steps
24+
25+
## State
26+
27+
- [Serialization](state/serialization.md) How state is serialized between checkpoints
28+
29+
## Observability
30+
31+
- [Logging](observability/logging.md) Structured logging within durable functions
32+
33+
## Language Guides
34+
35+
Language-specific installation and configuration:
36+
37+
- [TypeScript](languages/typescript/index.md)
38+
- [Python](languages/python/index.md)
39+
- [Java](languages/java/index.md)
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Language Guides
2+
3+
Language-specific installation and setup for the AWS Durable Execution SDK.
4+
5+
- [TypeScript](typescript/index.md)
6+
- [Python](python/index.md)
7+
- [Java](java/index.md)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Java SDK
2+
3+
Coming soon.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Python SDK
2+
3+
## Execution SDK
4+
5+
The execution SDK (`aws-durable-execution-sdk-python`) runs in your Lambda functions. It
6+
provides `DurableContext`, operations, and decorators. Install it in your Lambda
7+
deployment package.
8+
9+
```console
10+
pip install aws-durable-execution-sdk-python
11+
```
12+
13+
## Testing SDK
14+
15+
The testing SDK (`aws-durable-execution-sdk-python-testing`) lets you test durable
16+
functions locally without AWS. It provides `DurableFunctionTestRunner`, pytest
17+
integration, and result inspection. Install it in your development environment only.
18+
19+
```console
20+
pip install aws-durable-execution-sdk-python-testing
21+
```
22+
23+
## Decorators
24+
25+
The SDK provides decorators to mark functions as durable:
26+
27+
- `@durable_execution` - Marks your Lambda handler as a durable function
28+
- `@durable_step` - Marks a function that can be used with `context.step()`
29+
- `@durable_with_child_context` - Marks a function that receives a child context
30+
31+
The Python SDK uses synchronous methods and does not support `await`.

0 commit comments

Comments
 (0)