Skip to content

renedekat/pr-analyser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PR Analyser

A PHP console application that analyzes GitHub pull request metrics including approval times, merge times, and potential time savings through pair programming.

Features

  • Fetches pull request data from GitHub GraphQL API
  • Stores data in SQLite database with proper indexing
  • Generates comprehensive statistics with percentile analysis
  • Calculates potential cost savings from pair programming
  • Distinguishes between config and normal pull requests
  • Visual bar charts for time distribution analysis

Requirements

  • PHP 8.4 or higher
  • Composer
  • GitHub Personal Access Token

Installation

1. Clone the Repository

git clone https://github.com/renedekat/pr-analyser
cd pr-analyser

2. Install Dependencies

composer install

3. Environment Configuration

Copy the example environment file and configure it:

cp .env.example .env

Edit .env with your settings:

# GitHub Configuration
GITHUB_TOKEN=your_github_personal_access_token
GITHUB_REPO=owner/repository-name

# Optional Configuration
BATCH_SIZE=100
DATABASE=pr_metrics.sqlite

Environment Variables

  • GITHUB_TOKEN (required): GitHub Personal Access Token with repository read access
  • GITHUB_REPO (required): Repository in format owner/repository-name (e.g., facebook/react)
  • BATCH_SIZE (optional): Number of PRs to fetch per API call (default: 100)
  • DATABASE (optional): SQLite database filename, stored in storage/ directory (default: pr_metrics.sqlite)

Creating a GitHub Token

  1. Go to GitHub Settings → Developer settings → Personal access tokens
  2. Generate new token (classic)
  3. Select scopes: public_repo (for public repos) or repo (for private repos)
  4. Copy the token to your .env file

4. Create Storage Directory

mkdir -p storage

Usage

The application provides two main commands that should be run in sequence:

1. Fetch Pull Requests (Required First)

Fetch and store pull request data from GitHub:

./bin/console app:fetch-pull-requests

This command will:

  • Connect to GitHub API using your token
  • Fetch all pull requests (open and merged) from the specified repository
  • Store PR data and reviews in SQLite database
  • Clear existing data before importing (full refresh)

Dry Run Mode

Test the fetch process without storing data:

./bin/console app:fetch-pull-requests --dry-run
# Or using composer:
composer fetch -- --dry-run

Dry run mode:

  • Connects to GitHub API and fetches pull requests
  • Validates your credentials and repository access
  • Shows what would be fetched (batch information and PR counts)
  • Does not create or modify database tables
  • Does not store any data
  • Useful for testing configuration before running a full import

2. Generate Statistics

Generate and display pull request statistics:

./bin/console app:generate-stats [options]

Time Range Options

  • --alltime (default): Analyze all pull requests
  • --ytd: Year to date statistics
  • --previous365days: Previous 365 days
  • --previous5years: Previous 5 years

Additional Options

  • --salary=AMOUNT: Yearly salary for cost savings calculation (default: 150000)

Examples

# All time statistics with default salary
./bin/console app:generate-stats --alltime

# Year to date with custom salary
./bin/console app:generate-stats --ytd --salary=120000

# Previous 365 days
./bin/console app:generate-stats --previous365days --salary=180000

Composer Scripts (Shortcuts)

For convenience, you can use these shorter composer commands:

# Run tests
composer test

# Fetch pull requests
composer fetch
composer fetch -- --dry-run  # Dry run mode

# Generate statistics
composer stats          # All time (default)
composer stats:ytd      # Year to date
composer stats:365      # Previous 365 days
composer stats:5y       # Previous 5 years

Passing additional options:

To pass additional options to composer scripts, use -- followed by your options:

# Fetch with dry-run
composer fetch -- --dry-run

# Stats with custom salary
composer stats:ytd -- --salary=120000

# Multiple options
composer stats -- --salary=180000

Output Explained

The statistics are generated for three categories:

PR Categories

  1. ALL PRs: All non-draft, merged pull requests
  2. CONFIG PRs: Pull requests with titles starting with "CONFIG" or "[CONFIG]"
  3. NORMAL PRs: All other pull requests

Metrics Displayed

For each category, you'll see:

Time Analysis

  • Approval time: Time from PR creation to first approval
  • Merge time: Time from PR creation to merge
  • Approval → Merge time: Time from first approval to merge

Statistical Data

  • Count of merged PRs
  • Average, median, and percentile distributions (10th-100th percentiles)
  • Minimum and maximum times
  • Visual bar charts showing time distribution

Cost Analysis

  • Potential savings: Estimated time/cost savings with pair programming
  • Conservative estimate: 30% efficiency factor accounting for multitasking
  • Calculations based on:
    • 252 working days per year
    • 8 hours per working day
    • 30% improvement efficiency from pair programming

Database Schema

The application creates two tables with proper indexing:

pull_requests

  • pr_number (PRIMARY KEY)
  • created_at - PR creation timestamp
  • merged_at - PR merge timestamp (NULL for unmerged)
  • is_config - Boolean flag for config PRs
  • is_draft - Boolean flag for draft PRs

reviews

  • id (AUTO INCREMENT)
  • pull_request_number - Foreign key to pull_requests
  • state - Review state (APPROVED, CHANGES_REQUESTED, etc.)
  • submitted_at - Review submission timestamp

Testing

Run the test suite:

composer test

Or directly with PHPUnit:

./vendor/bin/phpunit

Code Coverage

Generate code coverage reports:

# HTML coverage report (output to coverage/ directory)
composer test:coverage

# Text coverage report
composer test:coverage-text

The test suite includes:

  • Unit tests for all service classes
  • Integration tests for database operations
  • Mock-based tests for GitHub API interactions
  • Comprehensive coverage of:
    • Date range calculations
    • Statistical functions (percentiles, averages, min/max)
    • Time formatting utilities
    • Database service operations
    • GitHub service pagination and error handling

Troubleshooting

Common Issues

  1. Long-Running Fetch Operations

    • Large repositories (1000+ PRs) can take 30+ minutes to fetch
    • Process timeout is disabled to allow long-running operations
    • Progress is shown: "Batch 160/1632: 16000 PRs fetched"
    • You can safely Ctrl+C and restart - data is cleared on each run
    • Consider using --dry-run first to estimate fetch time
  2. GitHub API Rate Limits

    • The application respects rate limits automatically
    • Includes automatic retry with exponential backoff for timeouts
    • For large repositories, fetching may take several minutes to hours
  3. Database Permission Issues

    • Ensure the storage/ directory is writable
    • Check file permissions on the SQLite database
  4. Memory Issues with Large Repositories

    • Reduce BATCH_SIZE in .env for very large repositories
    • Monitor memory usage during fetch operations

Error Messages

  • Please set GITHUB_TOKEN and GITHUB_REPO in .env: Environment configuration missing
  • Failed to fetch data from GitHub API: Check token permissions and repository access
  • Database connection failed: Verify SQLite file path and permissions

Contributing

  1. Clone the repository.
  2. Add tests for new functionality
  3. Ensure all tests pass
  4. Push the changes

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages