Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does codeql support checking the contents of configuration files in yaml format? #16755

Open
Exloit opened this issue Jun 14, 2024 · 1 comment
Labels
question Further information is requested

Comments

@Exloit
Copy link

Exloit commented Jun 14, 2024

Hello, I have some golang applications that use yaml format files as configuration files, but R&D often writes some accounts and passwords in the configuration files.
How can I use codeql to automatically detect whether the contents of these files contain sensitive files?

  1. When the "codeql database create --language=go" command cannot retrieve the yml file
  2. I created 2 databases by "codeql database create --language=go,yaml ...", but how do I write queries for the yaml database?

Are there some open source queries that can be referenced?

@Exloit Exloit added the question Further information is requested label Jun 14, 2024
@smowton
Copy link
Contributor

smowton commented Jun 14, 2024

The yaml extractor is unusual in that the fragment of the database schema ("dbscheme") is replicated in the database schema for Ruby, Javascript and Python, meaning that YAML extractor can either populate a plain yaml database, or contribute to a Ruby, JS or Python database. It also means one way to extract YAML and easily use one of those languages' libraries to deal with the YAML database content is to create a one-line JS, Python or Ruby file and extract that language. There's no reason this couldn't also be done with Go, except that we haven't happened to have had that need yet.

That means the JS, Python and Ruby languages are also the places to look for examples of CodeQL that uses yaml data.

The basic database schema for YAML can be seen in the JS dbscheme (for example), starting at https://github.com/github/codeql/blob/main/javascript/ql/lib/semmlecode.javascript.dbscheme#L1057

Then there's a shared CodeQL module that defines YAML classes and predicates on top of the database schema: https://github.com/github/codeql/blob/main/shared/yaml/codeql/yaml/Yaml.qll -- for example, it defines YamlSequence for working with sequence types, with a getElement(int i) predicate for accessing elements.

That shared library uses a parameterised module InputSig which individual (JS, Python or Ruby) then specialise according to their needs: for example, JS does this here: https://github.com/github/codeql/blob/main/javascript/ql/lib/semmle/javascript/YAML.qll#L11

Then finally JavaScript queries can import that YAML module and write queries, like this: https://github.com/github/codeql/blob/main/javascript/ql/lib/semmle/javascript/Actions.qll

Here the JavaScript library is using the YAML classes and predicates to break down a Github Actions definition.

I hope this helps get to grips with using CodeQL on YAML data to some degree-- please do let me know if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants