Skip to content

A dataset of standalone GitLab instances to determine if the URI is hosten on GitLab without probing it

License

Notifications You must be signed in to change notification settings

prebuilder/GitLabInstancesDataset

Repository files navigation

GitLabInstancesDataset.py Unlicensed work

wheel (GitLab) wheel (GHA via nightly.link) GitLab Build Status GitLab Coverage GitHub Actions N∅ hard dependencies Libraries.io Status Code style: antiflash

A dataset of standalone GitLab instances to determine if the URI is hosten on GitLab without probing it.

While this is a python package, the actual version of the dataset in txt format can be downloaded by the URI https://raw.githubusercontent.com/prebuilder/GitLabInstancesDataset/master/GitLabInstancesDataset/KnownGitLabInstances.txt and used from any language you like.

How to used

  1. Parse URI, extract its domain.
  2. Check if we can guess if the service is GitLab from its domain name or URI only. If we can, it is not to be included into the dataset. a. Check if the domain contains the substring gitlab. b. Check if the path contains the substring gitlab.
  3. Check the name againsrt the dataset. In python it can be done using isGitLab(domainName) function.

Inclusion criteria

  • Neither domain name nor the path to the actual service contains the substring gitlab.

  • The dataset contains domain names only. Don't send URIs here.

  • The domain names in the list must be

    • normalized to lower case
    • unique
    • sorted
    • must not contain any empty components
  • LF line ending

  • no IDNs currently

About

A dataset of standalone GitLab instances to determine if the URI is hosten on GitLab without probing it

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages