plugin-dev/.github/workflows/ci-failure-analysis.yml at main · sjnims/plugin-dev · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
name: CI Failure Analysis

# Automatically analyze failed CI runs and provide actionable fix suggestions
# Triggers when Markdown Lint, Check Links, or Validate Workflows fail
#
# Testing: Intentionally fail markdownlint (e.g., change - to * in any .md file)

on:
  workflow_run:
    workflows:
      - "Markdown Lint"
      - "Check Links"
      - "Validate Workflows"
    types: [completed]

# Cancel in-progress analysis for same workflow run
concurrency:
  group: ${{ github.workflow }}-${{ github.event.workflow_run.id }}
  cancel-in-progress: true

jobs:
  analyze-failure:
    name: Analyze CI Failure
    # Only run on failure, skip bot triggers to prevent loops
    if: |
      github.event.workflow_run.conclusion == 'failure' &&
      github.event.workflow_run.actor.login != 'dependabot[bot]' &&
      github.event.workflow_run.actor.login != 'claude[bot]'
    runs-on: ubuntu-latest
    timeout-minutes: 10
    permissions:
      contents: read
      pull-requests: write
      actions: read
      id-token: write

    steps:
      - name: Analyze failure with Claude
        uses: anthropics/claude-code-action@ff9acae5886d41a99ed4ec14b7dc147d55834722 # v1.0.77
        with:
          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          prompt: |
            A CI workflow has failed. Analyze the failure and help the developer fix it.

            ## Failed Workflow Details

            - **Workflow**: ${{ github.event.workflow_run.name }}
            - **Run ID**: ${{ github.event.workflow_run.id }}
            - **Run URL**: ${{ github.event.workflow_run.html_url }}
            - **Commit SHA**: ${{ github.event.workflow_run.head_sha }}
            - **Branch**: ${{ github.event.workflow_run.head_branch }}
            - **Actor**: ${{ github.event.workflow_run.actor.login }}
            - **Repository**: ${{ github.repository }}

            ## Instructions

            ### Step 1: Get the failure logs

            First, download and examine the workflow run logs:

            ```bash
            gh run view ${{ github.event.workflow_run.id }} --log
            ```

            If that's too verbose, get the job summary first:

            ```bash
            gh run view ${{ github.event.workflow_run.id }} --json jobs --jq '.jobs[] | select(.conclusion == "failure") | {name: .name, conclusion: .conclusion}'
            ```

            ### Step 2: Analyze the failure

            Based on which workflow failed, look for specific error patterns:

            **For "Markdown Lint" failures:**
            - Look for lines like: `filename:line:column MD00X/rule-name message`
            - Common issues: wrong list style (should use `-`), wrong header style (should use ATX `#`)

            **For "Check Links" (lychee) failures:**
            - Look for broken URLs with HTTP status codes (404, 403, etc.)
            - Check if the URL has moved or if it's a false positive

            **For "Validate Workflows" (actionlint) failures:**
            - Look for YAML syntax errors or invalid workflow configurations
            - Check for missing required fields, invalid expressions, or deprecated syntax

            ### Step 3: Find the associated PR

            Try to find the pull request associated with this commit:

            ```bash
            gh pr list --search "${{ github.event.workflow_run.head_sha }}" --state open --json number,title,url --jq '.[0]'
            ```

            If no PR is found (e.g., push to main), note this - you won't be able to comment.

            ### Step 4: Post analysis comment

            If a PR was found, post a helpful comment with your analysis. The comment should include:

            1. **Summary**: What failed and why (1-2 sentences)
            2. **Failures Found**: List each error with file, line, and explanation
            3. **How to Fix**: Specific commands or code changes

            Use this format for the comment:

            ```markdown
            ## ❌ CI Failure Analysis: {Workflow Name}

            **Run**: [#{run_id}]({run_url})
            **Commit**: `{sha:7}`

            ### Summary

            {Brief explanation of what failed}

            ### Failures Found

            | File | Line | Issue |
            |------|------|-------|
            | `{file}` | {line} | {description} |

            ### How to Fix

            {For markdown failures}
            **Option 1**: Auto-fix with markdownlint
            ```bash
            markdownlint "{file}" --fix
            ```

            **Option 2**: Manual fix
            {Explain specific change needed}

            {For link failures}
            - Check if URL has moved and update the link
            - Or add to `.lycheeignore` if the link is expected to fail in CI

            {For workflow validation failures}
            - Show corrected YAML syntax

            ---
            🤖 Analyzed by [Claude](https://claude.ai/code)
            ```

            Post the comment using:
            ```bash
            gh pr comment {pr_number} --body "..."
            ```

            ### Step 5: Report outcome

            After posting (or if no PR found), summarize what you found and what action was taken.

          claude_args: |
            --model claude-opus-4-6
            --allowedTools "Bash(gh run:*),Bash(gh pr:*)"