See glossary for an explanation.
Ideally, every programming language (or technology) handler is written in its respective programming language. Each programming language or technology shall share the same algorithm:
- Get all importable symbols from the (standard) library
- Remove internal/non-exported symbols
- Iterate over each symbol and collect:
- Symbol name, e. g.
Buffer
- Namespace, e. g.
bytes
- Type, e. g.
int
- Value, e. g.
512
- Symbol name, e. g.
- Save all APIs to database
- Save a catalogue which contains the amount of APIs, the amount of namespaces, the namespaces and programming language or technology version
{
"_id": "",
"doc": "",
"name": "",
"type": "",
"ns": "",
"value": ""
}
Example:
{
"_id": "go/token.AND_ASSIGN",
"doc": "",
"name": "AND_ASSIGN",
"type": "int",
"ns": "go/token",
"value": "28"
}
{
"_id": "_cat",
"n_apis": 0,
"n_ns": 0,
"ns": [],
"version": ""
}
Example:
{
"_id": "_cat",
"n_apis": 8025,
"n_ns": 161,
"ns": [
"go/constant",
"fmt",
"regexp",
"crypto/rand",
"encoding"
],
"version": "1.22.0"
}
See glossary for an explanation.
To clone multiple repositories, a GitHub personal access token needs to be
passed as GITHUB_ACCESS_TOKEN_CONTRIBS
environmental variable.
Ideally, every programming language (or technology) handler is written in its respective programming language. Each programming language or technology shall share the same algorithm:
- Fetch repositories which contain the programming language or technology and (standard) library usages
- Iterate over each repository
- Clone repository to a temporary directory, e. g.
/tmp/traefik_traefik
- (Optional): Remove extraneous directories like
.git
- (Optional): Remove extraneous directories like
- Find files which qualify for a contribution, e. g.
.go
files - Iterate over each file and collect:
- Locus (see glossary) for each file
- Amount of files
- Amount of contributions
- Source code, e. g.
var x int\nx = 5
- File path, e. g.
/src/models
- File name,
user.go
- Repository owner
traefik
- Repository name
traefik
- Delete every contribution of the repository in the database
- Save all contributions to database
- Clone repository to a temporary directory, e. g.
- Save a catalogue which contains the amount of contributions and repositories to the database
- Save license information for each repository to database
{
"ident": "",
"line": 0
}
{
"locus": [],
"code": "",
"filename": "",
"filepath": "",
"repo_name": "",
"repo_owner": ""
}
Example:
{
"locus": [
{
"ident": "io.ReadAll",
"line": 10
},
{
"ident": "io.ReadCloser",
"line": 8
},
{
"ident": "os.ReadFile",
"line": 15
}
],
"code": "package cmdutil\n\nimport (\n\t\"io\"\n\t\"os\"\n)\n\nfunc ReadFile(filename string, stdin io.ReadCloser) ([]byte, error) {\n\tif filename == \"-\" {\n\t\tb, err := io.ReadAll(stdin)\n\t\t_ = stdin.Close()\n\t\treturn b, err\n\t}\n\n\treturn os.ReadFile(filename)\n}\n",
"filename": "file_input.go",
"filepath": "/pkg/cmdutil",
"repo_name": "cli",
"repo_owner": "cli"
}
{
"_id": "_licenses",
"repos": []
}
Example:
{
"_id": "_licenses",
"repos": [
{
"author": "GitHub Inc.",
"repo": [
"cli",
"cli"
],
"type": "MIT license"
},
{
"author": "Traefik Labs",
"repo": [
"traefik",
"traefik"
],
"type": "MIT license"
}
]
}
{
"_id": "_cat",
"n_contribs": 0,
"n_repos": 0
}
Example:
{
"_id": "_cat",
"n_contribs": 21459,
"n_repos": 89
}
MongoDB is used as database. On production, a contribution will be saved into a
MongoDB database at MONGO_DB_URI
(during development it falls back to:
mongodb://localhost:27017
).