Skip to content

A watchdog for GitHub. Designed to protect communities and maintain the integrity of open-source projects by barking at suspicious users.

License

Notifications You must be signed in to change notification settings

BearHuddleston/GitHubWatchdog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHubWatchdog - GitHub Suspicious User Detector

GitHubWatchdog is a Go-based microservice that leverages the GitHub API to search for repositories and analyze their owners for suspicious activity. The tool scans repositories using a predefined search query and applies heuristics to flag users who may exhibit unusual patterns, such as newly created accounts or repositories with low disk usage yet high star counts.

I have personally reported over 3000+ accounts using this tool.

Architecture & Project Structure

GitHubWatchdog/
├── cmd/
│   └── app/
│       └── main.go           # Bootstraps the application, initializes dependencies, and starts the search loop.
└── internal/
    ├── analyzer/
    │   ├── analyzer.go       # Contains user heuristics and analysis logic.
    │   └── heuristic.go      # Defines heuristic rules for suspicious activity detection.
    ├── config/
    │   └── config.go         # Reads environment variables and sets default configuration.
    ├── db/
    │   └── sqlite.go         # Implements SQLite-based storage for processed users and repositories.
    ├── github/
    │   ├── client.go         # Sets up the GitHub REST client.
    │   ├── cache.go          # Implements caching for GitHub API requests.
    │   └── rate_limiter.go   # Handles GitHub API rate limiting.
    ├── logger/
    │   └── logger.go         # Provides logging functionality.
    ├── models/
    │   └── models.go         # Defines data structures used throughout the application.
    └── web/
        ├── server.go         # Implements HTTP server for web interface.
        ├── handlers.go       # HTTP request handlers for web interface.
        ├── data.go           # Database query functions for web interface.
        ├── api.go            # API endpoints for GitHub data integration.
        ├── template_funcs.go # Template functions for web interface.
        ├── templates/        # HTML templates for web interface.
        └── static/           # Static assets (CSS, JavaScript) for web interface.

Overview

GitHubWatchdog performs the following tasks:

  • GitHub Client Initialization:
    Creates an authenticated GitHub client using a personal access token (see internal/github/client.go).

  • Repository Search & Processing:
    Uses GitHub's REST API to search for repositories matching a specific query (e.g., repositories created after a certain date with more than 5 stars). Results are dispatched to a worker pool for concurrent processing (see internal/processor/processor.go).

  • Processed Repository Tracking:
    The service tracks processed repositories and users in an SQLite database (github_watchdog.db) to avoid duplicate analysis. Database interactions are handled by internal/db/sqlite.go.

  • User Analysis:
    For repositories with low disk usage, the tool further analyzes the associated user’s account using various heuristics (such as account age, total stars across repositories, and contribution counts). The analysis logic is encapsulated in the internal/analyzer package.

  • Heuristic-Based Suspicious Detection:
    The system applies predefined heuristics (see internal/analyzer/heuristic.go) to flag accounts with suspicious behavior, such as new accounts with high stars or repositories with empty content but significant stargazer activity.

  • Suspicious User & Repository Recording:
    If a user or repository is flagged as suspicious, the relevant details are logged and stored in the SQLite database.

Requirements

  • Go Environment:
    Make sure you have Go installed (version 1.16 or later is recommended).

  • GitHub Personal Access Token:
    Export a GitHub token as an environment variable:

    export GITHUB_TOKEN=your_github_token_here
  • Dependencies:
    The project uses several Go packages including:

    • golang.org/x/oauth2
    • github.com/shurcooL/githubv4
    • github.com/mattn/go-sqlite3

    Dependencies are managed via Go modules. Use go mod tidy to ensure all dependencies are fetched.

  • Optional: Ollama for AI-powered Threat Analysis:

    • Install Ollama: https://ollama.ai/download
    • Run Ollama server locally: ollama serve
    • Pull the llama3.2 model: ollama pull llama3.2

Running the Application

Build and run the application from the project root:

Search Mode (Default)

go build -o githubwatchdog ./cmd/app
./githubwatchdog

Web Interface Mode

To run the application with the web interface for viewing the database:

go build -o githubwatchdog ./cmd/app
./githubwatchdog -web

The web server runs on port 8080 by default. You can access it at http://localhost:8080

The web interface includes the following features:

  • Dashboard: Overview of processed repositories, users, and detected flags
  • Repository View: List of analyzed repositories with status indicators
  • User View: List of analyzed GitHub users with suspicion status
  • Flags View: List of detected heuristic flags
  • Sortable Tables: Click on column headers to sort data
  • Pagination: Adjustable page size with navigation controls
  • Status Toggle: One-click toggle between clean/malicious or clean/suspicious states
  • Detailed Reports: Real-time reports using GitHub API for repositories and users
  • Markdown Rendering: Properly formatted README display in repository reports
  • Ollama Integration: AI-powered threat analysis using Ollama LLM for enhanced security assessment

Options:

  • -web: Run in web interface mode
  • -addr: Specify the web server address (default: ":8080")

Example with custom port:

./githubwatchdog -web -addr=":9090"

Note: A valid GitHub token is required for report functionality, which can be provided through the GITHUB_TOKEN environment variable or in the config.json file.

Ollama Integration

GitHubWatchdog can be configured to use Ollama, a locally-run LLM server, to provide AI-powered threat analysis of repositories and users:

  1. Configuration: In your config.json file, add the following section:

    "ollama": {
        "enabled": true,
        "endpoint": "http://localhost:11434",
        "model": "llama3.2"
    }
  2. Environment Variables: Alternatively, set these environment variables:

    export OLLAMA_ENABLED=true
    export OLLAMA_ENDPOINT=http://localhost:11434
    export OLLAMA_MODEL=llama3.2
  3. Generate Analysis: Once configured, use the API endpoint to generate analyses:

    POST /api/analysis/generate
    {
      "entity_type": "repo",  // or "user"
      "entity_id": "owner/repo"  // or "username"
    }
    

The system will cache analyses in the database to avoid regenerating them for repeat requests. Analysis results are also included in repository and user report API responses.

TO-DO List

Unit Testing

  • Develop comprehensive unit tests for:
    • GitHub client initialization.
    • Contribution counting logic.
    • User analysis with various edge cases.
    • Database persistence and retrieval.

Error Handling Improvements

  • ✅ Enhance error handling throughout the code, especially for network/API errors and database operations.

Configuration Enhancements

  • ✅ Introduce a configuration file (config.json) or command-line flags to allow dynamic setting of thresholds (e.g., repository size, stars threshold, page limits).

Logging Enhancements

  • ✅ Integrate a more robust logging framework that supports log levels and log file rotation.

Rate Limiting Handling

  • ✅ Improve handling for GitHub API rate limits, including automatic retries and exponential backoff.

Enhanced Query Parameters

  • Allow customization of the GitHub search query via environment variables or command-line arguments.

Performance Optimization

  • ✅ Investigate opportunities for further parallel processing when analyzing multiple repositories or users concurrently.

Web UI Integration

  • ✅ Develop and integrate a web UI for viewing database content
  • ✅ Add sortable tables with column headers
  • ✅ Implement pagination and customizable page size
  • ✅ Add status toggle for repository and user classification
  • ✅ Integrate detailed reports with GitHub API data
  • ✅ Implement Markdown rendering for repository READMEs
  • ✅ Add AI-powered threat analysis with Ollama integration
  • Enhance web UI with real-time monitoring and scanning process management

CI/CD Integration

  • Set up continuous integration to run tests on each commit and pull request.

About

A watchdog for GitHub. Designed to protect communities and maintain the integrity of open-source projects by barking at suspicious users.

Topics

Resources

License

Stars

Watchers

Forks