-
Notifications
You must be signed in to change notification settings - Fork 7
docs: revise key concepts, add section indices #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,38 +1,19 @@ | ||
| # Getting Started | ||
|
|
||
| ## Installation | ||
| This section will help you get started building durable functions in | ||
| [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html). | ||
|
|
||
| Install the SDK using pip: | ||
| Please see the [SDK Reference](../sdk-reference/) section for in-depth detail of the | ||
| concepts introduced here. | ||
|
|
||
| ```console | ||
| pip install aws-durable-execution-sdk-python | ||
| ``` | ||
| ## Prerequisites | ||
|
|
||
| ## Quick example | ||
| - An AWS account with permissions to create and execute AWS Lambda functions | ||
| - Familiarity with writing AWS Lambda functions | ||
|
|
||
| Here's a simple durable function that processes an order: | ||
| ## In this section | ||
|
|
||
| === "TypeScript" | ||
|
|
||
| ``` typescript | ||
| --8<-- "examples/typescript/index/quick-example.ts" | ||
| ``` | ||
|
|
||
| === "Python" | ||
|
|
||
| ``` python | ||
| --8<-- "examples/python/index/quick-example.py" | ||
| ``` | ||
|
|
||
| === "Java" | ||
|
|
||
| ``` java | ||
| --8<-- "examples/java/index/quick-example.java" | ||
| ``` | ||
|
|
||
| Each `context.step()` call is checkpointed automatically. If Lambda recycles your execution environment, the function resumes from the last completed step. | ||
|
|
||
| ## Next steps | ||
|
|
||
| - [Key Concepts](key-concepts.md) - Understand the mental model behind durable execution | ||
| - [Quick Start](quick-start.md) - Build and test your first durable function | ||
| - [Key Concepts](key-concepts.md) Understand durable execution, checkpoints, replay, and | ||
| the DurableContext before writing code | ||
| - [Quick Start](quick-start.md) Install the SDK, write your first durable function, and | ||
| test it locally |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,94 +2,180 @@ | |
|
|
||
| ## Durable execution | ||
|
|
||
| A durable execution represents the complete lifecycle of a Lambda durable function. The SDK uses a checkpoint and replay mechanism to track progress, suspend execution, and recover from failures. A single execution may span multiple Lambda invocations. | ||
| A durable execution is the complete lifecycle of an AWS Lambda durable function. It uses | ||
| a checkpoint and replay mechanism to track progress, suspend execution, and recover from | ||
| failures. When functions resume after suspension or interruptions, previously completed | ||
| checkpoints replay and the function continues execution. | ||
|
|
||
| The execution lifecycle could include multiple invocations of the Lambda function to | ||
| complete, particularly after suspensions or failure recovery. With these replays the | ||
| execution can run for extended periods (up to one year) while maintaining reliable | ||
| progress despite interruptions. | ||
|
|
||
| ### Timeouts | ||
|
|
||
| The | ||
| [execution timeout](https://docs.aws.amazon.com/lambda/latest/api/API_DurableConfig.html#lambda-Type-DurableConfig-ExecutionTimeout) | ||
| and Lambda function | ||
| [Timeout](https://docs.aws.amazon.com/lambda/latest/api/API_CreateFunction.html#lambda-CreateFunction-request-Timeout) | ||
| are different settings. The Lambda function timeout controls how long each individual | ||
| invocation can run (maximum 15 minutes). The execution timeout controls the total | ||
| elapsed time for the entire durable execution (maximum 1 year). | ||
|
|
||
| ## Durable functions | ||
|
|
||
| A durable function is a Lambda function decorated with `@durable_execution` that can be checkpointed and resumed. The function receives a `DurableContext` that provides methods for durable operations. | ||
| A durable function is a Lambda function configured with the | ||
| [`DurableConfig`](https://docs.aws.amazon.com/lambda/latest/dg/durable-configuration.html) | ||
| object at creation time. Lambda will then apply the checkpoint and replay mechanism to | ||
| the function's execution to make it durable at invocation time. | ||
|
|
||
| ## Operations | ||
| ## DurableContext | ||
|
|
||
| `DurableContext` is the context object your durable function receives instead of the | ||
| standard Lambda `Context`. It exposes all durable operations and provides methods for | ||
| creating checkpoints, managing execution flow, and coordinating with external systems. | ||
|
|
||
| Operations are units of work in a durable execution. Each operation type serves a specific purpose: | ||
| Your durable function receives a `DurableContext` instead of the default Lambda context: | ||
|
|
||
| - **Steps** - Execute code and checkpoint the result with retry support | ||
| - **Waits** - Pause execution for a specified duration without blocking Lambda | ||
| - **Callbacks** - Wait for external systems to respond with results | ||
| - **Invoke** - Call other durable functions to compose complex workflows | ||
| - **Child contexts** - Isolate nested workflows for better organization | ||
| - **Parallel** - Execute multiple operations concurrently with completion criteria | ||
| - **Map** - Process collections in parallel with batching and failure tolerance | ||
| === "TypeScript" | ||
|
|
||
| ```typescript | ||
| --8<-- "examples/typescript/getting-started/durable-context.ts" | ||
| ``` | ||
|
|
||
| === "Python" | ||
|
|
||
| ```python | ||
| --8<-- "examples/python/getting-started/durable-context.py" | ||
| ``` | ||
|
|
||
| === "Java" | ||
|
|
||
| ```java | ||
| --8<-- "examples/java/getting-started/durable-context.java" | ||
| ``` | ||
|
|
||
| ## Operations | ||
|
|
||
| Operations are units of work in a durable execution. Each operation type serves a | ||
| specific purpose: | ||
|
|
||
| - [Steps](../sdk-reference/operations/step.md) Execute business logic with automatic | ||
| checkpointing and configurable retry | ||
| - [Waits](../sdk-reference/operations/wait.md) Suspend execution for a duration without | ||
| consuming compute resources | ||
| - [Callbacks](../sdk-reference/operations/callback.md) Suspend execution and wait for an | ||
| external system to submit a result | ||
| - [Invoke](../sdk-reference/operations/invoke.md) Invoke another Lambda function and | ||
| checkpoint the result | ||
| - [Parallel](../sdk-reference/operations/parallel.md) Execute multiple independent | ||
| operations concurrently | ||
| - [Map](../sdk-reference/operations/map.md) Execute an operation on each item in an | ||
| array concurrently with optional concurrency control | ||
|
yaythomas marked this conversation as resolved.
|
||
| - [Child context](../sdk-reference/operations/child-context.md) Group operations into an | ||
| isolated context for sub-workflow organization and concurrent determinism | ||
| - [Wait for condition](../sdk-reference/operations/wait-for-condition.md) Poll for a | ||
| condition with automatic checkpointing between attempts | ||
|
|
||
| ## Checkpoints | ||
|
|
||
| Checkpoints are saved states of execution that allow resumption. When your function calls `context.step()` or other operations, the SDK creates a checkpoint and sends it to AWS. If Lambda recycles your environment or your function waits for an external event, execution can resume from the last checkpoint. | ||
| A checkpoint is a saved record of a completed durable operation: its type, name, inputs, | ||
| result, and timestamp. The SDK creates checkpoints automatically as your function | ||
| executes operations. Together, the checkpoints form a log that Lambda uses to resume | ||
| execution after a suspension or interruption. | ||
|
|
||
| When your code calls a durable operation, the SDK follows this sequence: | ||
|
|
||
| 1. **Check for an existing checkpoint** if this operation already completed in a | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this is high level intro for checkpoints, we don't need to mention that checkpoint will not save result for large size, correct?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great call-out, thank you 😁 Yes, I think that belongs in the more detailed section for child context and step. |
||
| previous invocation, the SDK returns the stored result without re-executing | ||
| 2. **Execute the operation** if no checkpoint exists, the SDK runs the operation code | ||
| 3. **Serialize the result** the SDK serializes the result for storage | ||
| 4. **Persist the checkpoint** the SDK calls the Lambda checkpoint API to durably store | ||
| the result before continuing | ||
| 5. **Return the result** execution continues to the next operation | ||
|
|
||
| Once the SDK persists a checkpoint, that operation's result is safe. If your function is | ||
| interrupted at any point, the SDK can replay up to the last persisted checkpoint on the | ||
| next invocation. | ||
|
|
||
| ## Replay | ||
|
|
||
| When your function resumes, completed operations don't re-execute. Instead, they return their checkpointed results instantly. This means your function code runs multiple times, but side effects only happen once per operation. | ||
| Lambda keeps a running log of all durable operations as your function executes. When | ||
| your function needs to pause or encounters an interruption, Lambda saves this checkpoint | ||
| log and stops the execution. When it's time to resume, Lambda invokes your function | ||
| again from the beginning and replays the checkpoint log: | ||
|
|
||
| 1. **Load checkpoint log** the SDK retrieves the checkpoint log for the execution from | ||
| Lambda | ||
| 2. **Run from beginning** your handler runs from the start, not from where it paused | ||
| 3. **Skip completed operations** as your code calls durable operations, the SDK checks | ||
| each against the checkpoint log and returns stored results without re-executing the | ||
| operation code | ||
| 4. **Resume at interruption point** when the SDK reaches an operation without a | ||
| checkpoint, it executes normally and creates new checkpoints from that point | ||
| forward | ||
|
|
||
| The SDK enforces determinism by validating that operation names and types match the | ||
| checkpoint log during replay. Your orchestration code must make the same sequence of | ||
| durable operation calls on every invocation. | ||
|
|
||
| ## Determinism | ||
|
|
||
| Because your code runs again on replay, it must be **deterministic** — avoid random values, timestamps, or external API calls outside of steps, as these can produce different values on replay. | ||
| Because your code runs again on replay, it must be **deterministic**. Deterministic | ||
| means that the code always produces the same results given the same inputs. Given the | ||
| same inputs and checkpoint log, your function must make the same sequence of durable | ||
| operation calls. Avoid operations with side effects (like generating random numbers or | ||
| getting the current time) outside of steps, as these can produce different values during | ||
| replay and cause non-deterministic behavior. | ||
|
|
||
| ## How replay works in practice | ||
| ### Rules for deterministic durable operations | ||
|
|
||
| 1. All durable operations in a context must start sequentially. | ||
| 2. To run durable operations concurrently, wrap each set of operations in its own child | ||
| context and then run the child contexts concurrently. | ||
| 3. Only use the child `DurableContext` in the child context scope. Do not use any | ||
| parent's context in a child context scope. | ||
|
|
||
| ## Replay Walkthrough | ||
|
|
||
| Let's trace through a simple workflow: | ||
|
|
||
| === "TypeScript" | ||
|
|
||
| ``` typescript | ||
| ```typescript | ||
| --8<-- "examples/typescript/getting-started/execution-model.ts" | ||
| ``` | ||
|
|
||
| === "Python" | ||
|
|
||
| ``` python | ||
| ```python | ||
| --8<-- "examples/python/getting-started/execution-model.py" | ||
| ``` | ||
|
|
||
| === "Java" | ||
|
|
||
| ``` java | ||
| ```java | ||
| --8<-- "examples/java/getting-started/execution-model.java" | ||
| ``` | ||
|
|
||
| **First invocation (t=0s):** | ||
|
|
||
| 1. Lambda invokes your function | ||
| 2. `fetch_data` executes and calls an external API | ||
| 3. Result is checkpointed to AWS | ||
| 4. `context.wait(Duration.from_seconds(30))` is reached | ||
| 5. Function returns, Lambda can recycle the environment | ||
| 1. You start a durable execution by invoking a durable function | ||
| 2. The durable functions service invokes your durable function handler | ||
| 3. The fetch step runs and calls an external API | ||
| 4. The SDK checkpoints the result of the fetch step | ||
| 5. Execution reaches `context.wait()` and the SDK checkpoints the wait operation | ||
| 6. The SDK terminates the current Lambda invocation, but the durable execution is still | ||
| active | ||
|
|
||
| **Second invocation (t=30s):** | ||
|
|
||
| 1. Lambda invokes your function again | ||
| 2. Function code runs from the beginning | ||
| 3. `fetch_data` returns the checkpointed result instantly (no API call) | ||
| 4. `context.wait()` is already complete, execution continues | ||
| 5. `process_data` executes for the first time | ||
|
|
||
| ## The two SDKs | ||
|
|
||
| ### Execution SDK (`aws-durable-execution-sdk-python`) | ||
|
|
||
| Runs in your Lambda functions. Provides `DurableContext`, operations, decorators, and serialization. Install in your Lambda deployment package. | ||
|
|
||
| ```console | ||
| pip install aws-durable-execution-sdk-python | ||
| ``` | ||
|
|
||
| ### Testing SDK (`aws-durable-execution-sdk-python-testing`) | ||
|
|
||
| A separate SDK for testing your durable functions locally without AWS. Provides `DurableFunctionTestRunner`, pytest integration, and result inspection. Install in your development environment only. | ||
|
|
||
| ```console | ||
| pip install aws-durable-execution-sdk-python-testing | ||
| ``` | ||
|
|
||
| ## Decorators | ||
|
|
||
| The SDK provides decorators to mark functions as durable: | ||
|
|
||
| - `@durable_execution` - Marks your Lambda handler as a durable function | ||
| - `@durable_step` - Marks a function that can be used with `context.step()` | ||
| - `@durable_with_child_context` - Marks a function that receives a child context | ||
| 1. The durable functions service invokes your function again | ||
| 2. The function runs from the ginning | ||
| 3. The fetch step returns its checkpointed result instantly, it does not re-execute the | ||
| API call | ||
| 4. The wait has already elapsed, so execution continues | ||
| 5. The process step runs for the first time | ||
| 6. The SDK checkpoints the result of the process step | ||
| 7. The function returns naturally and the invocation ends | ||
| 8. The durable execution ends | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| # Patterns | ||
|
|
||
| Reusable patterns and guidance for common durable function use cases. | ||
|
|
||
| ## In this section | ||
|
|
||
| - [Best Practices](best-practices.md) Function design, timeout configuration, naming | ||
| conventions, and common mistakes to avoid |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # SDK Reference | ||
|
|
||
| The SDK Reference covers everything you need to build, configure, and operate durable | ||
| functions. | ||
|
|
||
| ## Operations | ||
|
|
||
| The core building blocks for constructing durable workflows: | ||
|
|
||
| - [Step](operations/step.md) Execute and checkpoint a unit of work | ||
| - [Wait](operations/wait.md) Pause execution for a duration | ||
| - [Wait for Condition](operations/wait-for-condition.md) Pause until an external | ||
| condition is met | ||
| - [Callback](operations/callback.md) Resume execution via an external signal | ||
| - [Invoke](operations/invoke.md) Invoke another durable function | ||
| - [Parallel](operations/parallel.md) Execute multiple operations concurrently | ||
| - [Map](operations/map.md) Apply an operation across a collection | ||
| - [Child Context](operations/child-context.md) Scope a sub-workflow within a parent | ||
|
|
||
| ## Error Handling | ||
|
|
||
| - [Errors](error-handling/errors.md) Exception types and error response formats | ||
| - [Retries](error-handling/retries.md) Configuring retry behavior for steps | ||
|
|
||
| ## State | ||
|
|
||
| - [Serialization](state/serialization.md) How state is serialized between checkpoints | ||
|
|
||
| ## Observability | ||
|
|
||
| - [Logging](observability/logging.md) Structured logging within durable functions | ||
|
|
||
| ## Language Guides | ||
|
|
||
| Language-specific installation and configuration: | ||
|
|
||
| - [TypeScript](languages/typescript/index.md) | ||
| - [Python](languages/python/index.md) | ||
| - [Java](languages/java/index.md) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # Language Guides | ||
|
|
||
| Language-specific installation and setup for the AWS Durable Execution SDK. | ||
|
|
||
| - [TypeScript](typescript/index.md) | ||
| - [Python](python/index.md) | ||
| - [Java](java/index.md) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # Java SDK | ||
|
|
||
| Coming soon. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| # Python SDK | ||
|
|
||
| ## Execution SDK | ||
|
|
||
| The execution SDK (`aws-durable-execution-sdk-python`) runs in your Lambda functions. It | ||
| provides `DurableContext`, operations, and decorators. Install it in your Lambda | ||
| deployment package. | ||
|
|
||
| ```console | ||
| pip install aws-durable-execution-sdk-python | ||
| ``` | ||
|
|
||
| ## Testing SDK | ||
|
|
||
| The testing SDK (`aws-durable-execution-sdk-python-testing`) lets you test durable | ||
| functions locally without AWS. It provides `DurableFunctionTestRunner`, pytest | ||
| integration, and result inspection. Install it in your development environment only. | ||
|
|
||
| ```console | ||
| pip install aws-durable-execution-sdk-python-testing | ||
| ``` | ||
|
|
||
| ## Decorators | ||
|
|
||
| The SDK provides decorators to mark functions as durable: | ||
|
|
||
| - `@durable_execution` - Marks your Lambda handler as a durable function | ||
| - `@durable_step` - Marks a function that can be used with `context.step()` | ||
| - `@durable_with_child_context` - Marks a function that receives a child context | ||
|
|
||
| The Python SDK uses synchronous methods and does not support `await`. |
Uh oh!
There was an error while loading. Please reload this page.