This repository contains a CDK project that deploys an architecture capable of retrieving and storing daily traffic data about the GitHub repositories to which you have push access to.
The GitHub repository traffic API provides access to the following information:
- Repository clones
- Top referral paths
- Top referral sources
- Page views
This CDK project collects repository clones and page views on a daily basis, to allow the representation of historical data and the collection of trends on its content. You will need a GitHub access token to use the features of this project. To generate one, visit https://github.com/settings/tokens.
The following diagram shows the architecture that will be deployed. Considerations are discussed below it:
- To track the traffic of your repositories, modify the contents of the
REPOS
array defined in the Lambda functiongetRepositoriesTraffic
. You must have push access to the repositories you want to track. If you get an API rate access error, take a look at the GitHub REST API documentation - The EventBridge rule
RuleGetRepoTraffic
is triggered every day at 9 AM UTC and executes the Lambda function that retrieves views and clones for the repositories that you specify - Repo traffic data is stored in the
RepoTraffic
DynamoDB table. This table has a partition keyrepo-name
and a sort keytimestamp
- Your GitHub access token is securely stored as a Secrets Manager Secret, and you provide it as a parameter when deploying the stack
The following steps assume that you have Python and venv installed in your local machine.
Navigate to the directory in your machine where you want the repository to be cloned and execute the following command:
git clone https://github.com/bpguasch/github-traffic-capture.git
After cloning this repository, navigate to the cdk-app
directory, and execute the following commands:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
Deploying AWS CDK apps into an AWS environment may require that you provision resources the AWS CDK needs to perform the deployment. These resources include an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments. Execute the following command to bootstrap your environment:
cdk bootstrap
You can read more about this process here.
You must specify a value for the GitHubAccessToken parameter when you deploy the stack. To do so, execute the following command:
cdk deploy --parameters GitHubAccessToken=<str value>
The deployment process will take roughly 3 minutes to complete.
To delete all the resources created by CDK:
- Navigate to the CloudFormation section in the AWS console.
- Select the stack named GitHubTrafficCaptureStack and click on Delete.
Alternatively, you can execute the following command from the cdk-app
directory:
cdk destroy