aws · bchampp · May 6, 2026 · May 6, 2026 · May 6, 2026 · May 6, 2026
diff --git a/aws-lambda-durable-functions-power/POWER.md b/aws-lambda-durable-functions-power/POWER.md
@@ -87,7 +87,7 @@ Load the appropriate reference file based on what the user is working on:
 - **Testing**, **local testing**, **cloud testing**, **test runner**, or **flaky tests** -> see [testing-patterns.md](steering/testing-patterns.md)
 - **Deployment**, **CloudFormation**, **CDK**, **SAM**, **log groups**, **deploy**, or **infrastructure** -> see [deployment-iac.md](steering/deployment-iac.md)
 - **Advanced patterns**, **GenAI agents**, **completion policies**, **step semantics**, or **custom serialization** -> see [advanced-patterns.md](steering/advanced-patterns.md)
-- **troubleshooting**, **stuck execution**, **failed execution**, **debug execution ID**, or **execution history** -> see [troubleshooting-executions.md](steering/troubleshooting-executions.md)
+- **troubleshooting**, **stuck execution**, **failed execution**, **debug execution ID**, **execution history**, **execution error**, **why did my execution fail**, **execution timed out**, **callback not received**, **diagnose execution**, or **root cause execution** -> see [troubleshooting-executions.md](steering/troubleshooting-executions.md)
 
 ## Quick Reference
 

diff --git a/aws-lambda-durable-functions-power/steering/troubleshooting-executions.md b/aws-lambda-durable-functions-power/steering/troubleshooting-executions.md
@@ -19,8 +19,8 @@ When spawning the troubleshooting agent, provide:
 
 ```
 Diagnose durable function execution issue:
-- Function: <function-name>:<alias> (must be qualified ARN)
-- Execution ID: <execution-id>
+- Durable Execution ARN: <durable-execution-arn>
+- Region: <region> (infer from ARN)
 
 CRITICAL SAFETY RULES:
 - This is READ-ONLY diagnosis
@@ -29,31 +29,99 @@ CRITICAL SAFETY RULES:
 - Only suggest manual remediation if user explicitly requests it
 
 Steps:
-1. Run: aws lambda get-durable-execution-history --function-name <function> --execution-id <id>
-2. Analyze execution status (RUNNING/SUCCEEDED/FAILED/TIMED_OUT)
-3. Check for stuck operations (PENDING/RUNNING status)
-4. Identify failed operations and error messages
-5. Calculate operation durations and timeline
-6. Diagnose specific issue:
-   - Stuck in WAIT_FOR_CALLBACK: Extract callback ID, suggest manual callback
-   - Failed operations: Show error and retry attempts
-   - Timeout: Calculate total duration, identify slow operations
-   - Unexpected behavior: Compare operation order with expected flow
-7. Provide specific recommendations and next steps
-
-Use jq for JSON parsing and analysis.
+1. Fetch the execution history directly:
+   Run: aws lambda get-durable-execution-history --durable-execution-arn <durable-execution-arn> --region <region> --include-execution-data
+
+2. If the command succeeds, analyze and provide a user-friendly diagnosis:
+   a. Report the execution status (RUNNING/SUCCEEDED/FAILED/TIMED_OUT)
+   b. Identify the root cause:
+      - Failed operations: Show the EXACT error message verbatim in a code block
+      - Stuck in WAIT_FOR_CALLBACK: Extract callback ID, show how long it's been waiting
+      - Timeout: Show which operation was running when timeout occurred
+      - Unexpected behavior: Compare operation order with expected flow
+   c. Calculate operation durations and timeline
+   d. Provide a clear, plain-language explanation of what went wrong and why
+
+3. If the command fails:
+   - Execution not found: Tell the user the execution ID may be incorrect or the execution may have been purged. Ask them to verify the ARN.
+   - Permissions/network error: Suggest checking IAM permissions (lambda:GetDurableExecutionHistory)
+   - In either case, direct them to the console as a fallback (see step 4)
+
+4. ALWAYS provide a direct link to the Execution Details page in the Lambda console.
+   Parse the ARN (arn:<partition>:lambda:<region>:<accountId>:function:<functionName>:<functionVersion>/durable-execution/<executionName>/<invocationId>)
+   to extract region, functionName, functionVersion, executionName, and invocationId, then construct:
+   https://<region>.console.aws.amazon.com/lambda/home?region=<region>#/functions/<functionName>/versions/<functionVersion>/executions/<executionName>/<invocationId>
+
+   Frame it as: "**[View this execution in the console](<url>)**"
+
+5. Provide specific, actionable next steps based on the diagnosis.
+
+6. If unable to determine the root cause from execution history:
+   - Provide the console link (step 4)
+   - Offer to fetch the log group from the function configuration (function name and version are in the ARN):
+     aws lambda get-function-configuration --function-name <functionName>:<functionVersion> --region <region> --query 'LoggingConfig.LogGroup'
+   - If unable to determine the log group, ask the user to check the function's logs
+
+OUTPUT FORMAT:
+- Lead with a short summary: "Your execution [status] because [reason]."
+- Show the error message verbatim in a code block if one exists.
+- Explain what this means in plain language.
+- Provide the console link for visual debugging.
+- List recommended next steps.
+
+Use jq for JSON parsing if available, otherwise use --query (AWS CLI built-in JMESPath) or python -m json.tool.
 ```
 
 ## Example Usage
 
 ```
-User: "My durable function execution abc-123 is stuck on my-function:prod"
+User: "My durable function execution abc-123 is stuck on my-function:prod in us-west-2"
+
+Agent: [Runs get-durable-execution-history command]
+Agent: [Analyzes with jq queries]
+
+Claude: Your execution is **waiting for an external callback** that hasn't arrived yet.
+
+        The operation `wait-for-approval` has been pending since 2026-02-14 (3 days).
+        It will time out in 12 hours if no callback is received.
+
+        Callback ID: `xyz789`
+
+        **[View this execution in the console](https://us-west-2.console.aws.amazon.com/lambda/home?region=us-west-2#/functions/my-function/versions/3/executions/abc-123/inv-456)**
+        ```
+        https://us-west-2.console.aws.amazon.com/lambda/home?region=us-west-2#/functions/my-function/versions/3/executions/abc-123/inv-456
+        ```
+
+        **Next steps:**
+        1. Check if the approval email was sent to the recipient
+        2. Verify the callback ID `xyz789` was included in the email
+        3. If you'd like me to help send the callback manually, let me know
+```
+
+```
+User: "Execution def-456 failed on order-processor:live in us-east-1"
 
-Claude: [Spawns Task agent with troubleshooting instructions]
 Agent: [Runs get-durable-execution-history command]
 Agent: [Analyzes with jq queries]
-Agent: [Returns: "Execution stuck in WAIT_FOR_CALLBACK operation 'wait-for-approval'.
-         Callback ID: xyz789. Waiting since 2026-02-14. Timeout in 12 hours.
-         Recommendation: Check if approval email was sent, or manually send callback."]
-Claude: [Presents findings and offers to send manual callback if needed]
+
+Claude: Your execution **failed** at the `charge-payment` step after 3 retry attempts.
+
+        Error message:
+        ```
+        PaymentGatewayError: Card declined - insufficient funds (code: card_declined)
+        ```
+
+        This means the payment provider rejected the charge. The durable function
+        retried 3 times with exponential backoff but the error persisted because
+        it's a permanent failure (not a transient network issue).
+
+        **[View this execution in the console](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/order-processor/versions/5/executions/def-456/inv-789)**
+        ```
+        https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/order-processor/versions/5/executions/def-456/inv-789
+        ```
+
+        **Next steps:**
+        1. This is a business logic failure, not an infrastructure issue
+        2. Consider adding a non-retryable error classification for `card_declined`
+        3. Implement a compensation step to release the reserved inventory
 ```