GitHubWatchdog is a Go-based microservice that leverages the GitHub API to search for repositories and analyze their owners for suspicious activity. The tool scans repositories using a predefined search query and applies heuristics to flag users who may exhibit unusual patterns, such as newly created accounts or repositories with low disk usage yet high star counts.
I have personally reported over 3000+ accounts using this tool.
GitHubWatchdog/
├── cmd/
│ └── app/
│ └── main.go # Bootstraps the application, initializes dependencies, and starts the search loop.
└── internal/
├── analyzer/
│ ├── analyzer.go # Contains user heuristics and analysis logic.
│ └── heuristic.go # Defines heuristic rules for suspicious activity detection.
├── config/
│ └── config.go # Reads environment variables and sets default configuration.
├── db/
│ └── sqlite.go # Implements SQLite-based storage for processed users and repositories.
├── github/
│ ├── client.go # Sets up the GitHub REST client.
│ ├── cache.go # Implements caching for GitHub API requests.
│ └── rate_limiter.go # Handles GitHub API rate limiting.
├── logger/
│ └── logger.go # Provides logging functionality.
├── models/
│ └── models.go # Defines data structures used throughout the application.
└── web/
├── server.go # Implements HTTP server for web interface.
├── handlers.go # HTTP request handlers for web interface.
├── data.go # Database query functions for web interface.
├── api.go # API endpoints for GitHub data integration.
├── template_funcs.go # Template functions for web interface.
├── templates/ # HTML templates for web interface.
└── static/ # Static assets (CSS, JavaScript) for web interface.
GitHubWatchdog performs the following tasks:
-
GitHub Client Initialization:
Creates an authenticated GitHub client using a personal access token (seeinternal/github/client.go
). -
Repository Search & Processing:
Uses GitHub's REST API to search for repositories matching a specific query (e.g., repositories created after a certain date with more than 5 stars). Results are dispatched to a worker pool for concurrent processing (seeinternal/processor/processor.go
). -
Processed Repository Tracking:
The service tracks processed repositories and users in an SQLite database (github_watchdog.db
) to avoid duplicate analysis. Database interactions are handled byinternal/db/sqlite.go
. -
User Analysis:
For repositories with low disk usage, the tool further analyzes the associated user’s account using various heuristics (such as account age, total stars across repositories, and contribution counts). The analysis logic is encapsulated in theinternal/analyzer
package. -
Heuristic-Based Suspicious Detection:
The system applies predefined heuristics (seeinternal/analyzer/heuristic.go
) to flag accounts with suspicious behavior, such as new accounts with high stars or repositories with empty content but significant stargazer activity. -
Suspicious User & Repository Recording:
If a user or repository is flagged as suspicious, the relevant details are logged and stored in the SQLite database.
-
Go Environment:
Make sure you have Go installed (version 1.16 or later is recommended). -
GitHub Personal Access Token:
Export a GitHub token as an environment variable:export GITHUB_TOKEN=your_github_token_here
-
Dependencies:
The project uses several Go packages including:golang.org/x/oauth2
github.com/shurcooL/githubv4
github.com/mattn/go-sqlite3
Dependencies are managed via Go modules. Use
go mod tidy
to ensure all dependencies are fetched. -
Optional: Ollama for AI-powered Threat Analysis:
- Install Ollama: https://ollama.ai/download
- Run Ollama server locally:
ollama serve
- Pull the llama3.2 model:
ollama pull llama3.2
Build and run the application from the project root:
go build -o githubwatchdog ./cmd/app
./githubwatchdog
To run the application with the web interface for viewing the database:
go build -o githubwatchdog ./cmd/app
./githubwatchdog -web
The web server runs on port 8080 by default. You can access it at http://localhost:8080
The web interface includes the following features:
- Dashboard: Overview of processed repositories, users, and detected flags
- Repository View: List of analyzed repositories with status indicators
- User View: List of analyzed GitHub users with suspicion status
- Flags View: List of detected heuristic flags
- Sortable Tables: Click on column headers to sort data
- Pagination: Adjustable page size with navigation controls
- Status Toggle: One-click toggle between clean/malicious or clean/suspicious states
- Detailed Reports: Real-time reports using GitHub API for repositories and users
- Markdown Rendering: Properly formatted README display in repository reports
- Ollama Integration: AI-powered threat analysis using Ollama LLM for enhanced security assessment
Options:
-web
: Run in web interface mode-addr
: Specify the web server address (default: ":8080")
Example with custom port:
./githubwatchdog -web -addr=":9090"
Note: A valid GitHub token is required for report functionality, which can be provided through the GITHUB_TOKEN
environment variable or in the config.json
file.
GitHubWatchdog can be configured to use Ollama, a locally-run LLM server, to provide AI-powered threat analysis of repositories and users:
-
Configuration: In your
config.json
file, add the following section:"ollama": { "enabled": true, "endpoint": "http://localhost:11434", "model": "llama3.2" }
-
Environment Variables: Alternatively, set these environment variables:
export OLLAMA_ENABLED=true export OLLAMA_ENDPOINT=http://localhost:11434 export OLLAMA_MODEL=llama3.2
-
Generate Analysis: Once configured, use the API endpoint to generate analyses:
POST /api/analysis/generate { "entity_type": "repo", // or "user" "entity_id": "owner/repo" // or "username" }
The system will cache analyses in the database to avoid regenerating them for repeat requests. Analysis results are also included in repository and user report API responses.
- Develop comprehensive unit tests for:
- GitHub client initialization.
- Contribution counting logic.
- User analysis with various edge cases.
- Database persistence and retrieval.
- ✅ Enhance error handling throughout the code, especially for network/API errors and database operations.
- ✅ Introduce a configuration file (
config.json
) or command-line flags to allow dynamic setting of thresholds (e.g., repository size, stars threshold, page limits).
- ✅ Integrate a more robust logging framework that supports log levels and log file rotation.
- ✅ Improve handling for GitHub API rate limits, including automatic retries and exponential backoff.
- Allow customization of the GitHub search query via environment variables or command-line arguments.
- ✅ Investigate opportunities for further parallel processing when analyzing multiple repositories or users concurrently.
- ✅ Develop and integrate a web UI for viewing database content
- ✅ Add sortable tables with column headers
- ✅ Implement pagination and customizable page size
- ✅ Add status toggle for repository and user classification
- ✅ Integrate detailed reports with GitHub API data
- ✅ Implement Markdown rendering for repository READMEs
- ✅ Add AI-powered threat analysis with Ollama integration
- Enhance web UI with real-time monitoring and scanning process management
- Set up continuous integration to run tests on each commit and pull request.