|
| 1 | +# Automated PR Analysis for Release Documentation - Methodology & Approach |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This document describes an efficient methodology for analyzing large numbers of pull requests (PRs) to generate comprehensive release |
| 6 | +documentation focusing on public-facing API changes, flag modifications, and breaking changes. |
| 7 | + |
| 8 | +_Key Innovation: Specialized Agent Architecture_ |
| 9 | + |
| 10 | +### Core Concept |
| 11 | + |
| 12 | +Instead of manually reviewing hundreds of PRs, we used specialized pr-flag-metric-tracker agents that can work in parallel to analyze |
| 13 | +PRs systematically for specific types of changes. The agent description can be found in the file `pr-flag-metric-tracker.md`. This can be copied into the `.claude/agents` directory. |
| 14 | + |
| 15 | +### Agent Capabilities |
| 16 | + |
| 17 | +- Automatically fetch PR content using GitHub CLI (gh pr view, gh pr diff, gh api) |
| 18 | +- Parse code changes for flag additions/deletions/modifications |
| 19 | +- Identify metric changes (Prometheus counters, gauges, etc.) |
| 20 | +- Detect API changes (gRPC/HTTP endpoints) |
| 21 | +- Find parser modifications (SQL syntax changes) |
| 22 | +- Spot query planning behavior changes |
| 23 | +- Generate standardized reports |
| 24 | + |
| 25 | +## Methodology: Three-Phase Approach |
| 26 | + |
| 27 | +### Phase 1: Bulk PR Discovery |
| 28 | + |
| 29 | +#### Get all PRs from milestone |
| 30 | +``` |
| 31 | +gh api 'repos/org/repo/issues?milestone=X&state=all' --paginate --jq '.[].number' |
| 32 | +``` |
| 33 | + |
| 34 | +### Phase 2: Parallel Analysis with Merge Filtering |
| 35 | + |
| 36 | +Key Innovation: Check merge status BEFORE analysis to avoid wasting time on unmerged PRs |
| 37 | + |
| 38 | +#### Check if PR was actually merged (not just closed) |
| 39 | +``` |
| 40 | +gh pr view PR_URL --json state,mergedAt |
| 41 | +``` |
| 42 | + |
| 43 | +Decision Tree: |
| 44 | +- If mergedAt is null → Create simple "PR not merged" file |
| 45 | +- If mergedAt has date → Perform full analysis |
| 46 | + |
| 47 | +### Phase 3: Batched Agent Deployment |
| 48 | + |
| 49 | +Deploy agents in batches of 5-10 PRs simultaneously for maximum parallelization while avoiding rate limits. |
| 50 | + |
| 51 | +#### Template Standardization |
| 52 | + |
| 53 | +By using a template, we can guide the agents to no be wordy and write a lot of unneccesary info to the reports that would then just take time to read and ignore. The agent profile describes a specific template to use. |
| 54 | + |
| 55 | +### Key Principles |
| 56 | + |
| 57 | +- Focus only on user-facing changes |
| 58 | +- No PR metadata or implementation details |
| 59 | +- Standardized sections for easy parsing |
| 60 | +- "No public changes" for PRs with only internal modifications |
| 61 | + |
| 62 | +## Efficiency Optimizations |
| 63 | + |
| 64 | +1. Batch Processing |
| 65 | + |
| 66 | +- Process 5-10 PRs per agent call |
| 67 | +- Parallel execution across multiple agents |
| 68 | +- Reduces API calls and context switching |
| 69 | + |
| 70 | +2. Smart Filtering |
| 71 | + |
| 72 | +- Merge status check first - eliminates ~30% of PRs immediately |
| 73 | +- Public-facing focus - skip internal refactoring and test-only changes |
| 74 | +- Template enforcement - consistent output format for easy aggregation |
| 75 | + |
| 76 | +3. Pre-approved Command Strategy |
| 77 | + |
| 78 | +Ensure agents only use pre-approved GitHub commands to avoid permission prompts: |
| 79 | +- gh pr view (approved) |
| 80 | +- gh pr diff (approved) |
| 81 | +- gh api (approved) |
| 82 | + |
| 83 | +4. Progressive Refinement |
| 84 | + |
| 85 | +- Start with checking for closed and merged PRs |
| 86 | +- Only analyze merged PRs |
| 87 | +- Avoid re-work through systematic tracking |
| 88 | + |
| 89 | +### Scalability Lessons |
| 90 | + |
| 91 | +What Worked Well |
| 92 | + |
| 93 | +1. Agent specialization - Single-purpose agents are more reliable than general-purpose |
| 94 | +2. Parallel execution - 5x faster than sequential analysis |
| 95 | +3. Standardized templates - Easy to aggregate and parse results |
| 96 | +4. Merge filtering - Eliminates ~30% of work upfront |
| 97 | +5. Batching - Reduces overhead and improves throughput |
| 98 | + |
| 99 | +### What to Avoid |
| 100 | + |
| 101 | +- Verbose reports with implementation details |
| 102 | +- Analyzing non-merged PRs |
| 103 | +- Sequential processing |
| 104 | +- Inconsistent report formats |
| 105 | +- Re-analyzing already completed work |
| 106 | + |
| 107 | +## Output Processing |
| 108 | + |
| 109 | +### Individual PR Reports |
| 110 | + |
| 111 | +Each PR gets a focused report following the standard template, making it easy to: |
| 112 | +- Scan for breaking changes |
| 113 | +- Identify new features |
| 114 | +- Track deprecations |
| 115 | +- Generate migration guides |
| 116 | + |
| 117 | +### Aggregate Reporting |
| 118 | + |
| 119 | +Parse all individual reports to create comprehensive release documentation with: |
| 120 | +- Structured tables by change category |
| 121 | +- Component-wise organization |
| 122 | +- Breaking change highlights |
| 123 | +- Migration timelines |
| 124 | + |
| 125 | +## Replication Instructions |
| 126 | + |
| 127 | +### Prerequisites |
| 128 | + |
| 129 | +- **GitHub CLI (`gh`)**: Authenticated and configured (`gh auth login`) |
| 130 | +- **Claude Code**: With agent support enabled |
| 131 | +- **Repository access**: Read permissions to target repository |
| 132 | +- **Agent setup**: Copy `pr-flag-metric-tracker.md` to your `.claude/agents/` directory |
| 133 | +- **Disk space**: ~5GB for storing individual PR reports |
| 134 | +- **Time estimate**: 4-6 hours for 276 PRs |
| 135 | +- **Cost estimate**: $40-50 in Claude API usage |
| 136 | + |
| 137 | +### Step-by-Step Process |
| 138 | + |
| 139 | +#### 1. Initial Setup |
| 140 | +```bash |
| 141 | +# Authenticate GitHub CLI if not already done |
| 142 | +gh auth login |
| 143 | + |
| 144 | +# Copy agent definition to Claude agents directory |
| 145 | +cp pr-flag-metric-tracker.md ~/.claude/agents/ |
| 146 | + |
| 147 | +# Create working directory |
| 148 | +mkdir release-analysis && cd release-analysis |
| 149 | +``` |
| 150 | + |
| 151 | +#### 2. Fetch Milestone PRs |
| 152 | +```bash |
| 153 | +# Get milestone ID from GitHub UI, then fetch all PR numbers |
| 154 | +gh api 'repos/vitessio/vitess/issues?milestone=MILESTONE_ID&state=all' --paginate --jq '.[].number' > all_pr_numbers.txt |
| 155 | + |
| 156 | +# Verify count |
| 157 | +echo "Total PRs to analyze: $(wc -l < all_pr_numbers.txt)" |
| 158 | +``` |
| 159 | + |
| 160 | +#### 3. Launch Batched Analysis |
| 161 | +Launch agents in batches of 5 PRs at a time using this prompt template: |
| 162 | + |
| 163 | +``` |
| 164 | +Analyze these 5 PRs in batch. For each: |
| 165 | +1. Check merge: gh pr view https://github.com/vitessio/vitess/pull/XXXX --json state,mergedAt |
| 166 | +2. If NOT merged: Create PRXXXX.md with just "PR not merged" |
| 167 | +3. If MERGED: Create full analysis with template focusing on public-facing changes |
| 168 | +
|
| 169 | +PRs: 18520, 18521, 18522, 18523, 18524 |
| 170 | +
|
| 171 | +Use ONLY: gh pr view, gh pr diff, gh api, Edit tool. Focus on flags, metrics, APIs, parser changes, query planning. |
| 172 | +``` |
| 173 | + |
| 174 | +#### 4. Monitor Progress |
| 175 | +```bash |
| 176 | +# Check completion status |
| 177 | +ls PR*.md | wc -l |
| 178 | +echo "Progress: $(ls PR*.md | wc -l)/$(wc -l < all_pr_numbers.txt)" |
| 179 | +``` |
| 180 | + |
| 181 | +#### 5. Generate Final Report |
| 182 | +Once all PRs analyzed, create comprehensive release documentation by parsing individual reports into structured tables. |
| 183 | + |
| 184 | +## Troubleshooting |
| 185 | + |
| 186 | +### Common Issues |
| 187 | + |
| 188 | +**Permission Errors with GitHub CLI**: |
| 189 | +- Ensure `gh pr view`, `gh pr diff`, `gh api` are pre-approved in Claude Code |
| 190 | +- Check GitHub token permissions |
| 191 | + |
| 192 | +**Agent Rate Limiting**: |
| 193 | +- Reduce batch size from 5 to 3 PRs |
| 194 | +- Add delays between batches if needed |
| 195 | + |
| 196 | +**Inconsistent Report Formats**: |
| 197 | +- Emphasize template adherence in agent prompts |
| 198 | +- Review and correct agent instructions |
| 199 | + |
| 200 | +**Missing PRs**: |
| 201 | +- Some PR numbers may not exist (normal in GitHub) |
| 202 | +- Agents will handle gracefully with "PR not found" reports |
| 203 | + |
| 204 | +### Performance Tips |
| 205 | + |
| 206 | +- **Batch size**: 5-10 PRs per agent call is optimal |
| 207 | +- **Parallel agents**: Launch multiple agent batches simultaneously |
| 208 | +- **Template enforcement**: Be strict about output format for easier parsing |
| 209 | +- **Merge filtering**: Always check merge status first to avoid wasted analysis |
| 210 | + |
| 211 | +## Expected Results |
| 212 | + |
| 213 | +**Time Performance**: |
| 214 | +- **Manual approach**: 2-3 minutes per PR = 9-14 hours for 276 PRs |
| 215 | +- **Automated approach**: 4-6 hours total including setup |
| 216 | +- **Efficiency gain**: 70%+ time reduction |
| 217 | + |
| 218 | +**Output Quality**: |
| 219 | +- Standardized format across all reports |
| 220 | +- Focus on user-impacting changes only |
| 221 | +- Easy to parse for release documentation |
| 222 | +- Comprehensive coverage with no missed PRs |
0 commit comments