This document provides a comprehensive overview of the Contributor Analytics project for AI agents and human contributors.
A sophisticated analytics platform for GitHub repositories that tracks, analyzes, scores, and summarizes contributor activity.
Core Features:
- Data Ingestion: Fetches PRs, issues, commits, reviews, comments from GitHub API
- Scoring System: Configurable algorithm scoring contributions by type, complexity, impact
- AI Summaries: Daily, weekly, monthly summaries using LLMs (via OpenRouter)
- Data Pipelines: Modular TypeScript pipeline system
- Web Interface: Next.js static site with leaderboards and contributor profiles
- Automation: GitHub Actions for daily processing and deployment
Tech Stack:
- Frontend: Next.js 15, React, TypeScript, Tailwind CSS, shadcn/ui
- Backend/Pipelines: TypeScript, Bun
- Database: SQLite with Drizzle ORM
- CI/CD: GitHub Actions
The TypeScript-based pipeline system handles all data operations:
- Entry Point:
cli/analyze-pipeline.ts - Orchestration: Functional composition (
pipe,parallel,mapStep) - Configuration:
config/pipeline.config.tsreads fromPIPELINE_CONFIG_FILE
Pipeline Stages:
- Ingest: Fetch from GitHub API → store in SQLite
- Process: Calculate scores and expertise tags
- Export: Generate JSON/MD files for frontend
- Summarize: AI-generated summaries via OpenRouter
SQLite database as single source of truth:
- Schema:
src/lib/data/schema.tsusing Drizzle ORM - Tables: users, repositories, pullRequests, issues, reviews, comments, scores
- Migrations:
drizzle/directory, managed by Drizzle Kit
Static Next.js 15 application for GitHub Pages:
- SSG: Server Components query SQLite at build time
- Routing: App Router with dynamic date-based pages
- Styling: Tailwind CSS + shadcn/ui components
Cloudflare Worker for GitHub OAuth:
- Exchanges GitHub code for access token
- Enables wallet linking feature for contributor profiles
run-pipelines.yml: Daily at 23:00 UTC, runs full pipeline chaindeploy.yml: Builds and deploys to GitHub Pagespr-checks.yml: Linting, type checking, build verification
pipeline-data: Manages_databranch lifecycle via Git worktreesrestore-db: Serializes/deserializes SQLite for version control
main: Application code_data: Generated data and SQLite dumps
Pipeline reads from JSON file specified by PIPELINE_CONFIG_FILE env var:
# Local development
export PIPELINE_CONFIG_FILE=config/example.jsonConfig options (config/example.json):
PIPELINE_REPOS: Repository list to trackPIPELINE_START_DATE: Contribution tracking start datePIPELINE_PROJECT_CONTEXT: AI summary contextPIPELINE_SCORING: Scoring rules for PRs, issues, reviewsPIPELINE_TAGS: Area/role/tech tag definitionsPIPELINE_BOT_USERS: Bot accounts to exclude
Pull production data for local development:
bun run data:sync # Sync from upstream
bun run data:sync --help # See all options- Modify
src/lib/data/schema.ts - Run
bun run db:generate - Run
bun run db:migrate
bun run pipeline --help # See all options
bun run pipeline ingest -d 7 # Small date range
bun run pipeline process -f # Force reprocess- Prefer type inference over manual signatures
- Never cast to
any- fix underlying type issues - Search for existing types/schemas before creating new ones
- Avoid comments on self-explanatory code
- Use
bun:testfor testing