From 41b3806d123ac84d35335593e95b1d50a39f42a9 Mon Sep 17 00:00:00 2001 From: Bhavani Date: Thu, 14 May 2026 16:18:56 -0500 Subject: [PATCH] fix(aws-devops-agent): correct mitigation workflow - use update-backlog-task MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mitigation plans are generated via update-backlog-task --task-status PENDING_START (not list-recommendations). Plans are stored as mitigation_summary_md journal records. Updated POWER.md and steering.md with correct flow: 1. Investigation COMPLETED 2. update-backlog-task --task-status PENDING_START 3. Poll until COMPLETED 4. list-executions → list-journal-records --record-type mitigation_summary_md --- aws-devops-agent/POWER.md | 12 ++++++------ aws-devops-agent/steering/steering.md | 10 +++++----- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/aws-devops-agent/POWER.md b/aws-devops-agent/POWER.md index 9059efc..6922878 100644 --- a/aws-devops-agent/POWER.md +++ b/aws-devops-agent/POWER.md @@ -187,7 +187,8 @@ Start with chat for instant answers. Escalate to investigation only when the pro 4. If complex root cause needed: aws___call_aws("aws devops-agent create-backlog-task ...") → escalate to deep research (5-8 min) Poll get-backlog-task + list-journal-records → stream progress - list-recommendations → get-recommendation → generate remediation code + aws___call_aws("aws devops-agent update-backlog-task --task-status PENDING_START ...") → trigger mitigation (2-5 min) + Poll get-backlog-task until COMPLETED again. Then call list-executions to find the newest execution_id, and list-journal-records --execution-id EXEC_ID --record-type mitigation_summary_md to get the mitigation plan ``` --- @@ -248,10 +249,9 @@ For incidents requiring deep root cause analysis: 2. aws___call_aws(cli_command="aws devops-agent create-backlog-task --agent-space-id SPACE_ID --task-type INVESTIGATION --title 'Describe the issue' --priority HIGH --description 'Include local context here' --region us-east-1") → taskId (executionId becomes available from get-backlog-task once IN_PROGRESS) 3. Poll every 30-45s: aws___call_aws(cli_command="aws devops-agent get-backlog-task --agent-space-id SPACE_ID --task-id TASK_ID --region us-east-1") until status changes from PENDING_START to IN_PROGRESS 4. Stream every 30-45s: aws___call_aws(cli_command="aws devops-agent list-journal-records --agent-space-id SPACE_ID --execution-id EXEC_ID --region us-east-1") -5. Once COMPLETED: aws___call_aws(cli_command="aws devops-agent list-recommendations --agent-space-id SPACE_ID --task-id TASK_ID --region us-east-1") → get-recommendation → generate remediation code -6. If list-recommendations returns empty, trigger mitigation in place: - aws___call_aws(cli_command="aws devops-agent update-backlog-task --agent-space-id SPACE_ID --task-id TASK_ID --task-status PENDING_START --region us-east-1") - Re-poll get-backlog-task until COMPLETED again (2-5 min), then re-call list-recommendations. +5. Once COMPLETED: trigger mitigation (2-5 min): aws___call_aws(cli_command="aws devops-agent update-backlog-task --agent-space-id SPACE_ID --task-id TASK_ID --task-status PENDING_START --region us-east-1") +6. Poll get-backlog-task every 30-45s until COMPLETED again, then: aws___call_aws(cli_command="aws devops-agent list-executions --agent-space-id SPACE_ID --task-id TASK_ID --region us-east-1") → find newest execution_id +7. Retrieve mitigation: aws___call_aws(cli_command="aws devops-agent list-journal-records --agent-space-id SPACE_ID --execution-id EXEC_ID --record-type mitigation_summary_md --region us-east-1") > **executionId format caveat**: `create-backlog-task` returns executionIds in `exe-ops1-UUID` format. The `aws___call_aws` CLI path handles this transparently, but `call_boto3(SendMessage)` expects a pure UUID. **Use `call_boto3` for chat sessions** (where `create-chat` returns a pure UUID) and **`aws___call_aws` CLI for investigation operations** (`list-journal-records`, `get-backlog-task`). This is a known service-side format inconsistency. ``` @@ -364,7 +364,7 @@ You: 5. If deeper root cause needed: aws___call_aws("aws devops-agent create-backlog-task --agent-space-id SPACE_ID --task-type INVESTIGATION --title 'ECS 503 errors on ' --priority HIGH --description '' --region us-east-1") Poll get-backlog-task + list-journal-records → stream progress with emojis - On complete: list-recommendations → get-recommendation → show fix + On complete: update-backlog-task --task-status PENDING_START → trigger mitigation (2-5 min) → poll until COMPLETED → list-executions to find newest execution_id → list-journal-records --execution-id EXEC_ID --record-type mitigation_summary_md 6. If recommendation has IaC: generate the fix code locally ``` diff --git a/aws-devops-agent/steering/steering.md b/aws-devops-agent/steering/steering.md index 17ccfad..c7a9821 100644 --- a/aws-devops-agent/steering/steering.md +++ b/aws-devops-agent/steering/steering.md @@ -39,8 +39,9 @@ Best for: cost optimization, architecture review, topology mapping, knowledge di 2. aws___call_aws(cli_command="aws devops-agent create-backlog-task --agent-space-id SPACE_ID --task-type INVESTIGATION --title '...' --priority HIGH --description '...' --region us-east-1") → taskId + executionId (executionId is returned immediately but may also be fetched later via get-backlog-task) 3. Poll every 30-45s: aws___call_aws(cli_command="aws devops-agent get-backlog-task --agent-space-id SPACE_ID --task-id TASK_ID --region us-east-1") until status=IN_PROGRESS 4. Stream: aws___call_aws(cli_command="aws devops-agent list-journal-records --agent-space-id SPACE_ID --execution-id EXEC_ID --region us-east-1") every 30-45s while IN_PROGRESS -5. Once COMPLETED: aws___call_aws(cli_command="aws devops-agent list-recommendations --agent-space-id SPACE_ID --task-id TASK_ID --region us-east-1") → get-recommendation → generate remediation code -6. If list-recommendations returns empty: aws___call_aws(cli_command="aws devops-agent update-backlog-task --agent-space-id SPACE_ID --task-id TASK_ID --task-status PENDING_START --region us-east-1") → re-poll until COMPLETED (2-5 min) → re-call list-recommendations +5. Once COMPLETED: trigger mitigation (2-5 min): aws___call_aws(cli_command="aws devops-agent update-backlog-task --agent-space-id SPACE_ID --task-id TASK_ID --task-status PENDING_START --region us-east-1") +6. Poll get-backlog-task every 30-45s until COMPLETED again, then: aws___call_aws(cli_command="aws devops-agent list-executions --agent-space-id SPACE_ID --task-id TASK_ID --region us-east-1") → find newest execution_id +7. Retrieve mitigation: aws___call_aws(cli_command="aws devops-agent list-journal-records --agent-space-id SPACE_ID --execution-id EXEC_ID --record-type mitigation_summary_md --region us-east-1") ``` ## Context Injection @@ -54,9 +55,8 @@ Best for: cost optimization, architecture review, topology mapping, knowledge di - ❌ Do NOT use `aws___call_aws` for `SendMessage` — it returns an EventStream that `call_aws` cannot handle. Use `aws___run_script` instead - ❌ Do NOT ask "should I investigate or chat?" — auto-route based on keywords - ❌ Do NOT forget `--task-type INVESTIGATION` when creating backlog tasks (required) -- ❌ Do NOT call `list-recommendations` before investigation status=COMPLETED (empty results) +- ❌ Do NOT call `list-recommendations` expecting mitigation plans — mitigation plans require triggering first (`update-backlog-task --task-status PENDING_START`), then appear as `mitigation_summary_md` in journal records. `list-recommendations` only returns proactive recommendations from the Evaluation Agent - ❌ Do NOT omit `--user-id` and `--user-type` from `create-chat` or `userId` from `SendMessage` — both are required for chat sessions -- ❌ Do NOT assume `list-recommendations` will have results after COMPLETED — recommendations may be empty until mitigation is explicitly triggered via `update-backlog-task --task-status PENDING_START` - ❌ Do NOT pass ARNs as `userId` — use simple usernames matching `^[a-zA-Z0-9_.-]+$` - ❌ Do NOT poll faster than every 30 seconds (wastes API quota) - ❌ Do NOT silently poll investigations — stream journal findings to user with emoji progress @@ -69,7 +69,7 @@ Best for: cost optimization, architecture review, topology mapping, knowledge di - **ResourceNotFoundException** → AgentSpace may be deleted, re-run `list-agent-spaces` - **ThrottlingException** → Wait 5 seconds and retry once - **ValidationException** on userId → alphanumeric, `.`, `-`, `_` only — no ARNs -- **Empty recommendations after COMPLETED** → Trigger mitigation: `aws devops-agent update-backlog-task --agent-space-id SPACE_ID --task-id TASK_ID --task-status PENDING_START` → re-poll until COMPLETED (2-5 min) → re-call list-recommendations +- **Empty recommendations after COMPLETED** → Trigger mitigation: `aws devops-agent update-backlog-task --agent-space-id SPACE_ID --task-id TASK_ID --task-status PENDING_START` → re-poll until COMPLETED (2-5 min) → `aws devops-agent list-executions --agent-space-id SPACE_ID --task-id TASK_ID` → find newest execution_id → `aws devops-agent list-journal-records --agent-space-id SPACE_ID --execution-id EXEC_ID --record-type mitigation_summary_md` - **ContentSizeExceededException** on SendMessage → Reduce message content length (max 32KB) - **MCP error -32000: Connection closed** → Missing/expired credentials or `uvx` not in PATH