Skip to content

Commit b872348

Browse files
github-actions[bot]Copilotdsyme
authored
Add Daily File Diet workflow (#194)
Adds a language-agnostic workflow that monitors source file sizes and creates actionable refactoring issues when files exceed 500 lines. Adapted from Peli's Agent Factory (79% merge rate, 26 merged PRs). Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Don Syme <dsyme@users.noreply.github.com>
1 parent 3ce0cf5 commit b872348

File tree

3 files changed

+294
-0
lines changed

3 files changed

+294
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ You can use the "/plan" agent to turn the reports into actionable issues which c
5757
- [🗜️ Documentation Unbloat](docs/unbloat-docs.md) - Automatically simplify documentation by reducing verbosity while maintaining clarity
5858
- [✨ Code Simplifier](docs/code-simplifier.md) - Automatically simplify recently modified code for improved clarity and maintainability
5959
- [🔍 Duplicate Code Detector](docs/duplicate-code-detector.md) - Identify duplicate code patterns and suggest refactoring opportunities
60+
- [🏋️ Daily File Diet](docs/daily-file-diet.md) - Monitor for oversized source files and create targeted refactoring issues
6061
- [🧪 Daily Test Improver](docs/daily-test-improver.md) - Improve test coverage by adding meaningful tests to under-tested areas
6162
- [⚡ Daily Perf Improver](docs/daily-perf-improver.md) - Analyze and improve code performance through benchmarking and optimization
6263

docs/daily-file-diet.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# 🏋️ Daily File Diet
2+
3+
> For an overview of all available workflows, see the [main README](../README.md).
4+
5+
The [Daily File Diet workflow](../workflows/daily-file-diet.md?plain=1) monitors your codebase for oversized source files and creates actionable refactoring issues when files grow beyond a healthy size threshold.
6+
7+
## Installation
8+
9+
Add the workflow to your repository:
10+
11+
```bash
12+
gh aw add https://github.com/githubnext/agentics/blob/main/workflows/daily-file-diet.md
13+
```
14+
15+
Then compile:
16+
17+
```bash
18+
gh aw compile
19+
```
20+
21+
## What It Does
22+
23+
The Daily File Diet workflow runs on weekdays and:
24+
25+
1. **Scans Source Files** - Finds all non-test source files in your repository, excluding generated directories like `node_modules`, `vendor`, `dist`, and `target`
26+
2. **Identifies Oversized Files** - Detects files exceeding 500 lines (the healthy size threshold)
27+
3. **Analyzes Structure** - Examines what the file contains: functions, classes, modules, and their relationships
28+
4. **Creates Refactoring Issues** - Proposes concrete split strategies with specific file names, responsibilities, and implementation guidance
29+
5. **Skips When Healthy** - If no file exceeds the threshold, reports all-clear with no issue created
30+
31+
## How It Works
32+
33+
````mermaid
34+
graph LR
35+
A[Scan Source Files] --> B[Sort by Line Count]
36+
B --> C{Largest File<br/>≥ 500 lines?}
37+
C -->|No| D[Report: All Files Healthy]
38+
C -->|Yes| E[Analyze File Structure]
39+
E --> F[Propose File Splits]
40+
F --> G[Create Refactoring Issue]
41+
````
42+
43+
The workflow focuses on **production source code only** — test files are excluded so the signal stays relevant. It skips files in generated directories and any files containing standard "DO NOT EDIT" generation markers.
44+
45+
### Why File Size Matters
46+
47+
Large files are a universal code smell that affects every programming language:
48+
49+
- **Hard to navigate**: Scrolling through 1000+ line files wastes developer time
50+
- **Increases merge conflicts**: Multiple developers frequently change the same large file
51+
- **Harder to test**: Large files tend to mix concerns, making isolated unit testing difficult
52+
- **Obscures ownership**: It's unclear who is responsible for what in a large catch-all file
53+
54+
The 500-line threshold is a practical guideline. Files near the threshold may be fine; files well over it are worth examining.
55+
56+
## Example Issues
57+
58+
From the original gh-aw repository (79% merge rate):
59+
- Targeting `add_interactive.go` (large file) → [PR refactored it into 6 domain-focused modules](https://github.com/github/gh-aw/pull/12545)
60+
- Targeting `permissions.go`[PR splitting into focused modules](https://github.com/github/gh-aw/pull/12363) (928 → 133 lines)
61+
62+
## Configuration
63+
64+
The workflow uses these default settings:
65+
66+
- **Schedule**: Weekdays at 1 PM UTC
67+
- **Threshold**: 500 lines
68+
- **Issue labels**: `refactoring`, `code-health`, `automated-analysis`
69+
- **Max issues per run**: 1 (one file at a time to avoid overwhelming the backlog)
70+
- **Issue expiry**: 2 days if not actioned
71+
- **Skip condition**: Does not run if a `[file-diet]` issue is already open
72+
73+
## Customization
74+
75+
You can customize the workflow by editing the source file:
76+
77+
```bash
78+
gh aw edit daily-file-diet
79+
```
80+
81+
Common customizations:
82+
- **Adjust the threshold** - Change the 500-line limit to suit your team's preferences
83+
- **Focus on specific languages** - Restrict `find` commands to your repository's primary language
84+
- **Add labels** - Apply team-specific labels to generated issues
85+
- **Change the schedule** - Run less frequently if your codebase changes slowly
86+
87+
## Tips for Success
88+
89+
1. **Work the backlog gradually** - The workflow creates one issue at a time to keep the backlog manageable
90+
2. **Split incrementally** - Refactor one module at a time to make review easier
91+
3. **Update imports throughout** - After splitting a file, search the codebase for all import paths that need updating
92+
4. **Trust the threshold** - Files just above 500 lines may not need splitting; focus on files that are significantly larger
93+
94+
## Source
95+
96+
This workflow is adapted from [Peli's Agent Factory](https://github.github.io/gh-aw/blog/2026-01-13-meet-the-workflows-continuous-refactoring/), where it achieved a 79% merge rate with 26 merged PRs out of 33 proposed in the gh-aw repository.
97+
98+
## Related Workflows
99+
100+
- [Code Simplifier](code-simplifier.md) - Simplifies recently modified code
101+
- [Duplicate Code Detector](duplicate-code-detector.md) - Finds and removes code duplication
102+
- [Daily Performance Improver](daily-perf-improver.md) - Optimizes code performance

workflows/daily-file-diet.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
---
2+
name: Daily File Diet
3+
description: Analyzes source files daily to identify oversized files that exceed healthy size thresholds, creating actionable refactoring issues
4+
on:
5+
workflow_dispatch:
6+
schedule:
7+
- cron: "0 13 * * 1-5"
8+
skip-if-match: 'is:issue is:open in:title "[file-diet]"'
9+
10+
permissions:
11+
contents: read
12+
issues: read
13+
pull-requests: read
14+
15+
tracker-id: daily-file-diet
16+
engine: copilot
17+
18+
safe-outputs:
19+
create-issue:
20+
expires: 2d
21+
title-prefix: "[file-diet] "
22+
labels: [refactoring, code-health, automated-analysis]
23+
assignees: copilot
24+
max: 1
25+
26+
tools:
27+
github:
28+
toolsets: [default]
29+
bash:
30+
- "find . -type f -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/vendor/*' -not -path '*/dist/*' -not -path '*/build/*' -not -path '*/.next/*' -not -path '*/target/*' -not -path '*/__pycache__/*' -not -path '*/coverage/*' -not -path '*/venv/*' -not -path '*/.tox/*' -not -path '*/.mypy_cache/*' -name '*' -exec wc -l {} \\; 2>/dev/null"
31+
- "wc -l *"
32+
- "head -n * *"
33+
- "grep -n * *"
34+
- "find . -type f -name '*.go' -not -path '*_test.go' -not -path '*/vendor/*'"
35+
- "find . -type f -name '*.py' -not -path '*/__pycache__/*' -not -path '*/venv/*'"
36+
- "find . -type f -name '*.ts' -not -path '*/node_modules/*' -not -path '*/dist/*'"
37+
- "find . -type f -name '*.js' -not -path '*/node_modules/*' -not -path '*/dist/*'"
38+
- "find . -type f -name '*.rb' -not -path '*/vendor/*'"
39+
- "find . -type f -name '*.java' -not -path '*/target/*'"
40+
- "find . -type f -name '*.rs' -not -path '*/target/*'"
41+
- "find . -type f -name '*.cs'"
42+
- "find . -type f \\( -name '*.go' -o -name '*.py' -o -name '*.ts' -o -name '*.js' -o -name '*.rb' -o -name '*.java' -o -name '*.rs' -o -name '*.cs' -o -name '*.cpp' -o -name '*.c' \\) -not -path '*/node_modules/*' -not -path '*/vendor/*' -not -path '*/dist/*' -not -path '*/build/*' -not -path '*/target/*' -not -path '*/__pycache__/*' -exec wc -l {} \\; 2>/dev/null"
43+
- "sort *"
44+
- "cat *"
45+
46+
timeout-minutes: 20
47+
strict: true
48+
---
49+
50+
# Daily File Diet Agent 🏋️
51+
52+
You are the Daily File Diet Agent - a code health specialist that monitors file sizes and promotes modular, maintainable codebases by identifying oversized source files that need refactoring.
53+
54+
## Mission
55+
56+
Analyze the repository's source files to identify the largest file and determine if it requires refactoring. Create an issue only when a file exceeds healthy size thresholds, providing specific guidance for splitting it into smaller, more focused files.
57+
58+
## Current Context
59+
60+
- **Repository**: ${{ github.repository }}
61+
- **Analysis Date**: $(date +%Y-%m-%d)
62+
- **Workspace**: ${{ github.workspace }}
63+
64+
## Analysis Process
65+
66+
### 1. Identify Source Files and Their Sizes
67+
68+
First, determine the primary programming language(s) used in this repository. Then find the largest source files using a command appropriate for the repository's language(s). For example:
69+
70+
**For polyglot or unknown repos:**
71+
```bash
72+
find . -type f \( -name "*.go" -o -name "*.py" -o -name "*.ts" -o -name "*.js" -o -name "*.rb" -o -name "*.java" -o -name "*.rs" -o -name "*.cs" -o -name "*.cpp" -o -name "*.c" \) \
73+
-not -path "*/node_modules/*" \
74+
-not -path "*/vendor/*" \
75+
-not -path "*/dist/*" \
76+
-not -path "*/build/*" \
77+
-not -path "*/target/*" \
78+
-not -path "*/__pycache__/*" \
79+
-exec wc -l {} \; 2>/dev/null | sort -rn | head -20
80+
```
81+
82+
Also skip test files (files ending in `_test.go`, `.test.ts`, `.spec.ts`, `.test.js`, `.spec.js`, `_test.py`, `test_*.py`, etc.) — focus on non-test production code.
83+
84+
Extract:
85+
- **File path**: Full path to the largest non-test source file
86+
- **Line count**: Number of lines in the file
87+
88+
### 2. Apply Size Threshold
89+
90+
**Healthy file size threshold: 500 lines**
91+
92+
If the largest non-test source file is **under 500 lines**, do NOT create an issue. Instead, output a simple status message:
93+
94+
```
95+
✅ All files are healthy! Largest file: [FILE_PATH] ([LINE_COUNT] lines)
96+
No refactoring needed today.
97+
```
98+
99+
If the largest non-test source file is **500 or more lines**, proceed to step 3.
100+
101+
### 3. Analyze the Large File's Structure
102+
103+
Read the file and understand its structure:
104+
105+
```bash
106+
head -n 100 <LARGE_FILE>
107+
```
108+
109+
```bash
110+
grep -n "^func\|^class\|^def\|^module\|^impl\|^struct\|^type\|^interface\|^export " <LARGE_FILE> | head -50
111+
```
112+
113+
Identify:
114+
- What logical concerns or responsibilities the file contains
115+
- Groups of related functions, classes, or modules
116+
- Areas with distinct purposes that could become separate files
117+
- Shared utilities that are scattered among unrelated code
118+
119+
### 4. Generate Issue Description
120+
121+
If the file exceeds 500 lines, create an issue using the following structure:
122+
123+
```markdown
124+
### Overview
125+
126+
The file `[FILE_PATH]` has grown to [LINE_COUNT] lines, making it harder to navigate and maintain. This task involves refactoring it into smaller, more focused files.
127+
128+
### Current State
129+
130+
- **File**: `[FILE_PATH]`
131+
- **Size**: [LINE_COUNT] lines
132+
- **Language**: [language]
133+
134+
<details>
135+
<summary><b>Structural Analysis</b></summary>
136+
137+
[Brief description of what the file contains: key functions, classes, modules, and their groupings]
138+
139+
</details>
140+
141+
### Refactoring Strategy
142+
143+
#### Proposed File Splits
144+
145+
Based on the file's structure, split it into the following modules:
146+
147+
1. **`[new_file_1]`**
148+
- Contents: [list key functions/classes]
149+
- Responsibility: [single-purpose description]
150+
151+
2. **`[new_file_2]`**
152+
- Contents: [list key functions/classes]
153+
- Responsibility: [single-purpose description]
154+
155+
3. **`[new_file_3]`** *(if needed)*
156+
- Contents: [list key functions/classes]
157+
- Responsibility: [single-purpose description]
158+
159+
### Implementation Guidelines
160+
161+
1. **Preserve Behavior**: All existing functionality must work identically after the split
162+
2. **Maintain Public API**: Keep exported/public symbols accessible with the same names
163+
3. **Update Imports**: Fix all import paths throughout the codebase
164+
4. **Test After Each Split**: Run the test suite after each incremental change
165+
5. **One File at a Time**: Split one module at a time to make review easier
166+
167+
### Acceptance Criteria
168+
169+
- [ ] Original file is split into focused modules
170+
- [ ] Each new file is under 300 lines
171+
- [ ] All tests pass after refactoring
172+
- [ ] No breaking changes to public API
173+
- [ ] All import paths updated correctly
174+
175+
---
176+
177+
**Priority**: Medium
178+
**Effort**: [Small/Medium/Large based on complexity]
179+
**Expected Impact**: Improved code navigability, easier testing, reduced merge conflicts
180+
```
181+
182+
## Important Guidelines
183+
184+
- **Only create issues when threshold is exceeded**: Do not create issues for files under 500 lines
185+
- **Skip generated files**: Ignore files in `dist/`, `build/`, `target/`, or files with a header indicating they are generated (e.g., "Code generated", "DO NOT EDIT")
186+
- **Skip test files**: Focus on production source code only
187+
- **Be specific and actionable**: Provide concrete file split suggestions, not vague advice
188+
- **Consider language idioms**: Suggest splits that follow the conventions of the repository's primary language (e.g., one class per file in Java, grouped by feature in Go, modules by responsibility in Python)
189+
- **Estimate effort realistically**: Large files with many dependencies may require significant refactoring effort
190+
191+
Begin your analysis now. Find the largest source file(s), assess if any need refactoring, and create an issue only if necessary.

0 commit comments

Comments
 (0)