A GitHub dataset sample of over 1000 repositories. Dataset was extracted using the Bright Data API.
url
: Repository web addressid
: Unique repository IDcode_language
: Main programming languagecode
: Repository source codenum_lines
: Total lines of codeuser_name
: Repository owner's usernameuser_url
: Owner's profile URLsize
: Repository sizesize_unit
: Repository size unitssize_num
: Repository size numberbreadcrumbs
: Repository navigation pathnum_issues
: Total issues countnum_pull_requests
: Total pull requests countnum_projects
: Number of associated projectsnum_fork
: Fork countnum_stared
: Star countlast_feature
: Latest feature changelatest_update
: Date of last update
And a lot more.
This is a sample subset which is derived from the "GitHub Repositories (public data)" dataset which includes more than 2,200,000 repositories.
Available dataset file formats: JSON, NDJSON, JSON Lines, CSV, or Parquet. Optionally, files can be compressed to .gz.
Dataset delivery type options: Email, API download, Webhook, Amazon S3, Google Cloud storage, Google Cloud PubSub, Microsoft Azure, Snowflake, SFTP.
Update frequency: Once, Daily, Weekly, Monthly, Quarterly, or Custom basis.
Data enrichment available as an addition to the data points extracted: Based on request.
Gain insights into the activity and health of open-source projects by tracking data points like commit histories, pull requests, and issue discussions. This data can help businesses identify high-impact projects, monitor trends, and discover collaboration opportunities in the open-source community. Evaluate the popularity and community backing of open-source projects by analyzing metrics such as star and fork counts. This information enables businesses to understand which projects are gaining traction, making informed decisions on adoption, and identifying technology trends. Utilize public GitHub profile data to foster engagement and advocacy within the open-source community. Identify active users who star, fork, and contribute to repositories in your field to create a network of advocates who can amplify your projects and fuel collaborative innovation.The Bright Initiative offers access to Bright Data's Web Scraper APIs and ready-to-use datasets to leading academic faculties and researchers, NGOs and NPOs promoting various environmental and social causes. You can submit an application here.