A scalable GitHub repository scraper that analyzes commit history and generates a leaderboard of contributors. The project is built with a modern stack for both backend and frontend, featuring asynchronous processing with a task queue.
- Fastify Server: Single endpoint for repository processing and leaderboard generation.
- Asynchronous Processing: Uses Bull and Redis for task queue management.
- Efficient Cloning: Bare cloning and incremental updates with
simple-git
. - Caching: PostgreSQL and Prisma for caching contributor data and reducing redundant API calls.
- Modern UI: Built with Next.js and styled with Tailwind CSS.
- Leaderboard Display: Interactive table showing contributor rankings and commit counts.
- Repository Management: Add and monitor GitHub repositories through a responsive interface.
Ensure the following tools are installed on your machine:
- Docker and Docker Compose
- Git
-
Clone the Repository:
git clone https://github.com/aalexmrt/github-scraper cd github-scraper
-
Set Up Environment Variables:
-
A sample
.env.example
file is provided in thebackend
folder. You can copy this file to create your.env
file.cp backend/.env.example backend/.env
-
Open the newly created
backend/.env
file and replace<your_github_personal_access_token>
with your GitHub Personal Access Token. Examplebackend/.env
file:# Database connection string DATABASE_URL=postgresql://user:password@db:5432/github_scraper # Redis connection settings REDIS_HOST=redis REDIS_PORT=6379 # GitHub API Personal Access Token GITHUB_TOKEN=<your_github_personal_access_token>
-
Note: The
backend/.env.example
file includes placeholder values to guide you. Ensure the actual.env
file is not shared or committed to version control to keep sensitive data secure. -
If you don't have a GitHub Personal Access Token yet, you can create one:
- Go to GitHub Developer Settings.
- Click "Generate new token" (classic).
- Select the necessary scopes (
read:user
andrepo
for private repository access if required). - Copy the token and add it to the
GITHUB_TOKEN
variable in yourbackend/.env
file.
-
-
Start Services:
- Run the following command to build and start all services using Docker Compose:
docker-compose up --build
- This will start the backend, frontend, PostgreSQL database, Redis, and the worker service.
- Run the following command to build and start all services using Docker Compose:
-
Access the Application:
- Backend API: Accessible at
http://localhost:3000
- To verify if the backend is running, you can use the
/health
endpoint:curl -X GET "http://localhost:3000/health"
- Expected Response:
{ "message": "Server is running." }
- Expected Response:
- To verify if the backend is running, you can use the
- Frontend UI: Accessible at
http://localhost:4000
- Backend API: Accessible at
You can use the /leaderboard
endpoint to process a GitHub repository and retrieve the leaderboard of contributors.
- Method:
GET
- URL:
http://localhost:3000/leaderboard
Parameter | Type | Description | Required |
---|---|---|---|
repoUrl |
string | The URL of the GitHub repository. | Yes |
Using curl
:
curl -X GET "http://localhost:3000/leaderboard?repoUrl=https://github.com/aalexmrt/github-scraper"
{
"message": "Repository is being processed."
}
{
"leaderboard": [
{
"commitCount": 43,
"username": null,
"email": "[email protected]",
"profileUrl": null
},
{
"commitCount": 2,
"username": "aalexmrt",
"email": "[email protected]",
"profileUrl": "https://github.com/aalexmrt"
}
]
}
{
"error": "Failed to process the leaderboard request."
}
The application frontend provides an interface to interact with the backend, making it easier to process repositories and view leaderboards.
-
Add a Repository
- Open the application frontend at
http://localhost:4000
. - Use the Add Repository form to submit a GitHub repository URL for processing.
- Open the application frontend at
-
Monitor Repository Processing
- Navigate to the Processed Repositories section to view the status of your repositories:
- Processing: The repository is currently being analyzed.
- On Queue: The repository is waiting for processing.
- Completed: The repository has been successfully processed.
- Navigate to the Processed Repositories section to view the status of your repositories:
-
View Contributor Leaderboard
- For completed repositories, click the Leaderboard button to view a detailed contributor leaderboard.
- Add support for private repositories with GitHub token validation in the
/leaderboard
endpoint. - Update the /leaderboard endpoint to split responsibilities by creating a new endpoint for processing and retrieving the leaderboard, and include the repository URL in the response.
- Add a form to input a repository URL and optional GitHub token.