A PHP console application that analyzes GitHub pull request metrics including approval times, merge times, and potential time savings through pair programming.
- Fetches pull request data from GitHub GraphQL API
- Stores data in SQLite database with proper indexing
- Generates comprehensive statistics with percentile analysis
- Calculates potential cost savings from pair programming
- Distinguishes between config and normal pull requests
- Visual bar charts for time distribution analysis
- PHP 8.4 or higher
- Composer
- GitHub Personal Access Token
git clone https://github.com/renedekat/pr-analyser
cd pr-analysercomposer installCopy the example environment file and configure it:
cp .env.example .envEdit .env with your settings:
# GitHub Configuration
GITHUB_TOKEN=your_github_personal_access_token
GITHUB_REPO=owner/repository-name
# Optional Configuration
BATCH_SIZE=100
DATABASE=pr_metrics.sqlite- GITHUB_TOKEN (required): GitHub Personal Access Token with repository read access
- GITHUB_REPO (required): Repository in format
owner/repository-name(e.g.,facebook/react) - BATCH_SIZE (optional): Number of PRs to fetch per API call (default: 100)
- DATABASE (optional): SQLite database filename, stored in
storage/directory (default:pr_metrics.sqlite)
- Go to GitHub Settings → Developer settings → Personal access tokens
- Generate new token (classic)
- Select scopes:
public_repo(for public repos) orrepo(for private repos) - Copy the token to your
.envfile
mkdir -p storageThe application provides two main commands that should be run in sequence:
Fetch and store pull request data from GitHub:
./bin/console app:fetch-pull-requestsThis command will:
- Connect to GitHub API using your token
- Fetch all pull requests (open and merged) from the specified repository
- Store PR data and reviews in SQLite database
- Clear existing data before importing (full refresh)
Test the fetch process without storing data:
./bin/console app:fetch-pull-requests --dry-run
# Or using composer:
composer fetch -- --dry-runDry run mode:
- Connects to GitHub API and fetches pull requests
- Validates your credentials and repository access
- Shows what would be fetched (batch information and PR counts)
- Does not create or modify database tables
- Does not store any data
- Useful for testing configuration before running a full import
Generate and display pull request statistics:
./bin/console app:generate-stats [options]--alltime(default): Analyze all pull requests--ytd: Year to date statistics--previous365days: Previous 365 days--previous5years: Previous 5 years
--salary=AMOUNT: Yearly salary for cost savings calculation (default: 150000)
# All time statistics with default salary
./bin/console app:generate-stats --alltime
# Year to date with custom salary
./bin/console app:generate-stats --ytd --salary=120000
# Previous 365 days
./bin/console app:generate-stats --previous365days --salary=180000For convenience, you can use these shorter composer commands:
# Run tests
composer test
# Fetch pull requests
composer fetch
composer fetch -- --dry-run # Dry run mode
# Generate statistics
composer stats # All time (default)
composer stats:ytd # Year to date
composer stats:365 # Previous 365 days
composer stats:5y # Previous 5 yearsPassing additional options:
To pass additional options to composer scripts, use -- followed by your options:
# Fetch with dry-run
composer fetch -- --dry-run
# Stats with custom salary
composer stats:ytd -- --salary=120000
# Multiple options
composer stats -- --salary=180000The statistics are generated for three categories:
- ALL PRs: All non-draft, merged pull requests
- CONFIG PRs: Pull requests with titles starting with "CONFIG" or "[CONFIG]"
- NORMAL PRs: All other pull requests
For each category, you'll see:
- Approval time: Time from PR creation to first approval
- Merge time: Time from PR creation to merge
- Approval → Merge time: Time from first approval to merge
- Count of merged PRs
- Average, median, and percentile distributions (10th-100th percentiles)
- Minimum and maximum times
- Visual bar charts showing time distribution
- Potential savings: Estimated time/cost savings with pair programming
- Conservative estimate: 30% efficiency factor accounting for multitasking
- Calculations based on:
- 252 working days per year
- 8 hours per working day
- 30% improvement efficiency from pair programming
The application creates two tables with proper indexing:
pr_number(PRIMARY KEY)created_at- PR creation timestampmerged_at- PR merge timestamp (NULL for unmerged)is_config- Boolean flag for config PRsis_draft- Boolean flag for draft PRs
id(AUTO INCREMENT)pull_request_number- Foreign key to pull_requestsstate- Review state (APPROVED, CHANGES_REQUESTED, etc.)submitted_at- Review submission timestamp
Run the test suite:
composer testOr directly with PHPUnit:
./vendor/bin/phpunitGenerate code coverage reports:
# HTML coverage report (output to coverage/ directory)
composer test:coverage
# Text coverage report
composer test:coverage-textThe test suite includes:
- Unit tests for all service classes
- Integration tests for database operations
- Mock-based tests for GitHub API interactions
- Comprehensive coverage of:
- Date range calculations
- Statistical functions (percentiles, averages, min/max)
- Time formatting utilities
- Database service operations
- GitHub service pagination and error handling
-
Long-Running Fetch Operations
- Large repositories (1000+ PRs) can take 30+ minutes to fetch
- Process timeout is disabled to allow long-running operations
- Progress is shown: "Batch 160/1632: 16000 PRs fetched"
- You can safely Ctrl+C and restart - data is cleared on each run
- Consider using
--dry-runfirst to estimate fetch time
-
GitHub API Rate Limits
- The application respects rate limits automatically
- Includes automatic retry with exponential backoff for timeouts
- For large repositories, fetching may take several minutes to hours
-
Database Permission Issues
- Ensure the
storage/directory is writable - Check file permissions on the SQLite database
- Ensure the
-
Memory Issues with Large Repositories
- Reduce
BATCH_SIZEin.envfor very large repositories - Monitor memory usage during fetch operations
- Reduce
Please set GITHUB_TOKEN and GITHUB_REPO in .env: Environment configuration missingFailed to fetch data from GitHub API: Check token permissions and repository accessDatabase connection failed: Verify SQLite file path and permissions
- Clone the repository.
- Add tests for new functionality
- Ensure all tests pass
- Push the changes
This project is licensed under the MIT License - see the LICENSE file for details.