-
Notifications
You must be signed in to change notification settings - Fork 176
Add linter for Dockerfile[.in] and GHA for it #5210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive Dockerfile linter tool that detects and fixes outdated ENV/ARG syntax in Dockerfile files. The tool modernizes syntax by adding proper = assignment operators and quotes where necessary, ensuring compliance with modern Docker best practices.
- Adds a Python-based linter for Dockerfile and Dockerfile.in files
- Includes GitHub Actions workflow for automated CI checks
- Provides comprehensive test coverage with unit tests
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/ci/dockerfile_linter/dockerfile_linter.py | Main implementation of the Dockerfile linter with CLI support |
| tools/ci/dockerfile_linter/test_dockerfile_linter.py | Comprehensive unit tests for all linter functionality |
| tools/ci/dockerfile_linter/requirements.txt | Dependencies file (currently empty as tool uses only stdlib) |
| tools/ci/dockerfile_linter/init.py | Package initialization file with metadata |
| tools/ci/dockerfile_linter/main.py | Module entry point for python -m execution |
| tools/ci/dockerfile_linter/README.md | Detailed documentation and usage examples |
| tools/ci/dockerfile_linter/.gitignore | Git ignore rules for Python cache files |
| .github/workflows/dockerfile-check.yml | GitHub Actions workflow for automated syntax checking |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| self.linter.process_directory(str(self.test_dir)) | ||
|
|
||
| # Should process files | ||
| self.assertEqual(self.linter.files_processed, 1) # Only finds Dockerfile, not Dockerfile.good |
Copilot
AI
Sep 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is incorrect. Based on the find_dockerfiles logic, it should find files named 'Dockerfile' and 'Dockerfile.in', but 'Dockerfile.good' would not match these patterns and should not be counted.
| env_match = re.match(r'^(ENV)\s+([A-Z_][A-Z0-9_]*)\s+(.+)$', stripped) | ||
| arg_match = re.match(r'^(ARG)\s+([A-Z_][A-Z0-9_]*)\s+(.+)$', stripped) |
Copilot
AI
Sep 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex patterns for ENV and ARG variable names are overly restrictive. Docker allows lowercase letters, periods, and hyphens in variable names. The patterns should be [A-Za-z_][A-Za-z0-9_.-]* to match actual Docker variable naming conventions.
| env_match = re.match(r'^(ENV)\s+([A-Z_][A-Z0-9_]*)\s+(.+)$', stripped) | |
| arg_match = re.match(r'^(ARG)\s+([A-Z_][A-Z0-9_]*)\s+(.+)$', stripped) | |
| env_match = re.match(r'^(ENV)\s+([A-Za-z_][A-Za-z0-9_.-]*)\s+(.+)$', stripped) | |
| arg_match = re.match(r'^(ARG)\s+([A-Za-z_][A-Za-z0-9_.-]*)\s+(.+)$', stripped) |
| for i, value in enumerate(all_values): | ||
| if i == len(all_values) - 1: | ||
| # Last line - no continuation backslash | ||
| modified_lines.append(f" {value}\"") | ||
| else: | ||
| # Continuation line | ||
| modified_lines.append(f" {value} \\") |
Copilot
AI
Sep 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multiline formatting adds extra indentation (4 spaces) but doesn't preserve the original indentation from the input. This could change the visual formatting of carefully formatted Dockerfiles. Consider preserving or detecting the original indentation pattern.
| for i, value in enumerate(all_values): | |
| if i == len(all_values) - 1: | |
| # Last line - no continuation backslash | |
| modified_lines.append(f" {value}\"") | |
| else: | |
| # Continuation line | |
| modified_lines.append(f" {value} \\") | |
| # Detect indentation from the first continuation line, if available | |
| indent = "" | |
| if start_idx + 1 < len(lines): | |
| match = re.match(r"^(\s*)", lines[start_idx + 1]) | |
| if match: | |
| indent = match.group(1) | |
| if indent == "": | |
| indent = " " # fallback to 4 spaces if no indentation detected | |
| for i, value in enumerate(all_values): | |
| if i == len(all_values) - 1: | |
| # Last line - no continuation backslash | |
| modified_lines.append(f"{indent}{value}\"") | |
| else: | |
| # Continuation line | |
| modified_lines.append(f"{indent}{value} \\") |
| **/Dockerfile.in | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v4 |
Copilot
AI
Sep 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actions/setup-python action should use the latest version v5 instead of v4 for security updates and improved functionality.
| uses: actions/setup-python@v4 | |
| uses: actions/setup-python@v5 |
| return False | ||
|
|
||
| # Check if value contains spaces or other characters that need quoting | ||
| return bool(re.search(r'[\s\$\`\|\&\;\(\)\<\>\{\}\[\]\*\?]', value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python has quotemeta, too: https://docs.python.org/3/library/re.html#re.escape
2209f90 to
2873a7c
Compare
- it can be also used to fix issue found. See README.md Signed-off-by: Mikhail Malyshev <[email protected]>
- we run on changed files only - it should generate output that GH understands Signed-off-by: Mikhail Malyshev <[email protected]>
Signed-off-by: Mikhail Malyshev <[email protected]>
f281c34 to
f31b0b3
Compare
|
In principle, I like this idea. We should gave Dockerfiles (and everything else) up to date in syntax. I have two areas of question. First, how does this fit with the other GHA CI paths we follow? Should this be part of one of the others? We have a fairly complex structure of CI in GHA, maybe we should check with @yash-zededa ? Second, this sounds like it fits in line with what yetus does? For all of our complaints about yetus, it does a good job centralizing all of the various linters and checkers into a single place. Should this be part of that? Is there a yetus option or plugin that can do this? |
europaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some nitpicking
| def main(): | ||
| ap = argparse.ArgumentParser(description="Dockerfile linter (Tree-sitter only) for modernizing ENV/ARG.") | ||
| ap.add_argument('directory', nargs='?', default='.', help='Directory to process (default: .)') | ||
| ap.add_argument('--ci', action='store_true', help='CI mode: lint only, do not modify files; nonzero exit on issues') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh I'm not a big fan of using flag names such as ci or github-actions - when looking at those I have absolutely no idea what they mean practically and need to always go look at the help messages to figure it out. Since there is no dark magic happening in the background I would just call this one something like --lint-only or --check-syntax or --only-warn or --fail-on-syntax-warnings
| pip install -r tools/ci/dockerfile_linter/requirements.txt | ||
| pip install pytest | ||
|
|
||
| - name: Lint changed Dockerfiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the script does the reformatting automatically, wouldn't it be better to just apply it to all Dockerfiles that we have, check them in and then on every CI run check all the files again?
Or do you fear rebuilding all packages again? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@europaul i definitively will run it on eve tree after this PR is merged
|
@rucoder , it looks like the Linter is proven to work, so you can remove the fake-test commit in order to finalize the review and allow the PR to get merged... |
Description
We have a lot of Dockerfile files that use outdated syntaf for ENV and VAR assignments. This PR introduces a tool that can be used both as linter and and also can fix all Dockefile's automatically
UPDATE 1: I decided got for tree-sitter based linter
How to test and validate this PR
no need to test from EVE perspective but the package has tests in Python
I also added a dummy Dockerfile to test GH annotations

PR Backports
Checklist
And the last but not least:
check them.