Skip to content

Conversation

@rucoder
Copy link
Contributor

@rucoder rucoder commented Sep 5, 2025

Description

We have a lot of Dockerfile files that use outdated syntaf for ENV and VAR assignments. This PR introduces a tool that can be used both as linter and and also can fix all Dockefile's automatically

UPDATE 1: I decided got for tree-sitter based linter

How to test and validate this PR

no need to test from EVE perspective but the package has tests in Python

I also added a dummy Dockerfile to test GH annotations
image

image

PR Backports

- 14.5-stable: Up to maintainers 
- 13.4-stable: Up to maintainers 

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive Dockerfile linter tool that detects and fixes outdated ENV/ARG syntax in Dockerfile files. The tool modernizes syntax by adding proper = assignment operators and quotes where necessary, ensuring compliance with modern Docker best practices.

  • Adds a Python-based linter for Dockerfile and Dockerfile.in files
  • Includes GitHub Actions workflow for automated CI checks
  • Provides comprehensive test coverage with unit tests

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tools/ci/dockerfile_linter/dockerfile_linter.py Main implementation of the Dockerfile linter with CLI support
tools/ci/dockerfile_linter/test_dockerfile_linter.py Comprehensive unit tests for all linter functionality
tools/ci/dockerfile_linter/requirements.txt Dependencies file (currently empty as tool uses only stdlib)
tools/ci/dockerfile_linter/init.py Package initialization file with metadata
tools/ci/dockerfile_linter/main.py Module entry point for python -m execution
tools/ci/dockerfile_linter/README.md Detailed documentation and usage examples
tools/ci/dockerfile_linter/.gitignore Git ignore rules for Python cache files
.github/workflows/dockerfile-check.yml GitHub Actions workflow for automated syntax checking

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

self.linter.process_directory(str(self.test_dir))

# Should process files
self.assertEqual(self.linter.files_processed, 1) # Only finds Dockerfile, not Dockerfile.good
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is incorrect. Based on the find_dockerfiles logic, it should find files named 'Dockerfile' and 'Dockerfile.in', but 'Dockerfile.good' would not match these patterns and should not be counted.

Copilot uses AI. Check for mistakes.
Comment on lines 104 to 105
env_match = re.match(r'^(ENV)\s+([A-Z_][A-Z0-9_]*)\s+(.+)$', stripped)
arg_match = re.match(r'^(ARG)\s+([A-Z_][A-Z0-9_]*)\s+(.+)$', stripped)
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex patterns for ENV and ARG variable names are overly restrictive. Docker allows lowercase letters, periods, and hyphens in variable names. The patterns should be [A-Za-z_][A-Za-z0-9_.-]* to match actual Docker variable naming conventions.

Suggested change
env_match = re.match(r'^(ENV)\s+([A-Z_][A-Z0-9_]*)\s+(.+)$', stripped)
arg_match = re.match(r'^(ARG)\s+([A-Z_][A-Z0-9_]*)\s+(.+)$', stripped)
env_match = re.match(r'^(ENV)\s+([A-Za-z_][A-Za-z0-9_.-]*)\s+(.+)$', stripped)
arg_match = re.match(r'^(ARG)\s+([A-Za-z_][A-Za-z0-9_.-]*)\s+(.+)$', stripped)

Copilot uses AI. Check for mistakes.
Comment on lines 310 to 316
for i, value in enumerate(all_values):
if i == len(all_values) - 1:
# Last line - no continuation backslash
modified_lines.append(f" {value}\"")
else:
# Continuation line
modified_lines.append(f" {value} \\")
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multiline formatting adds extra indentation (4 spaces) but doesn't preserve the original indentation from the input. This could change the visual formatting of carefully formatted Dockerfiles. Consider preserving or detecting the original indentation pattern.

Suggested change
for i, value in enumerate(all_values):
if i == len(all_values) - 1:
# Last line - no continuation backslash
modified_lines.append(f" {value}\"")
else:
# Continuation line
modified_lines.append(f" {value} \\")
# Detect indentation from the first continuation line, if available
indent = ""
if start_idx + 1 < len(lines):
match = re.match(r"^(\s*)", lines[start_idx + 1])
if match:
indent = match.group(1)
if indent == "":
indent = " " # fallback to 4 spaces if no indentation detected
for i, value in enumerate(all_values):
if i == len(all_values) - 1:
# Last line - no continuation backslash
modified_lines.append(f"{indent}{value}\"")
else:
# Continuation line
modified_lines.append(f"{indent}{value} \\")

Copilot uses AI. Check for mistakes.
**/Dockerfile.in

- name: Set up Python
uses: actions/setup-python@v4
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actions/setup-python action should use the latest version v5 instead of v4 for security updates and improved functionality.

Suggested change
uses: actions/setup-python@v4
uses: actions/setup-python@v5

Copilot uses AI. Check for mistakes.
return False

# Check if value contains spaces or other characters that need quoting
return bool(re.search(r'[\s\$\`\|\&\;\(\)\<\>\{\}\[\]\*\?]', value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rucoder rucoder marked this pull request as draft September 6, 2025 11:35
@rucoder rucoder force-pushed the rucider/dockerfile-linter branch from 2209f90 to 2873a7c Compare September 6, 2025 13:47
@rucoder rucoder marked this pull request as ready for review September 6, 2025 14:00
- it can be also used to fix issue found. See README.md

Signed-off-by: Mikhail Malyshev <[email protected]>
- we run on changed files only
- it should generate output that GH understands

Signed-off-by: Mikhail Malyshev <[email protected]>
@rucoder rucoder force-pushed the rucider/dockerfile-linter branch from f281c34 to f31b0b3 Compare September 6, 2025 16:25
@rucoder rucoder added the side-quest A worthy adventure, but not essential for victory. Tackle when the main quest is safe! label Sep 7, 2025
@deitch
Copy link
Contributor

deitch commented Sep 8, 2025

In principle, I like this idea. We should gave Dockerfiles (and everything else) up to date in syntax.

I have two areas of question.

First, how does this fit with the other GHA CI paths we follow? Should this be part of one of the others? We have a fairly complex structure of CI in GHA, maybe we should check with @yash-zededa ?

Second, this sounds like it fits in line with what yetus does? For all of our complaints about yetus, it does a good job centralizing all of the various linters and checkers into a single place. Should this be part of that? Is there a yetus option or plugin that can do this?

Copy link
Contributor

@europaul europaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some nitpicking

def main():
ap = argparse.ArgumentParser(description="Dockerfile linter (Tree-sitter only) for modernizing ENV/ARG.")
ap.add_argument('directory', nargs='?', default='.', help='Directory to process (default: .)')
ap.add_argument('--ci', action='store_true', help='CI mode: lint only, do not modify files; nonzero exit on issues')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I'm not a big fan of using flag names such as ci or github-actions - when looking at those I have absolutely no idea what they mean practically and need to always go look at the help messages to figure it out. Since there is no dark magic happening in the background I would just call this one something like --lint-only or --check-syntax or --only-warn or --fail-on-syntax-warnings

pip install -r tools/ci/dockerfile_linter/requirements.txt
pip install pytest

- name: Lint changed Dockerfiles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the script does the reformatting automatically, wouldn't it be better to just apply it to all Dockerfiles that we have, check them in and then on every CI run check all the files again?

Or do you fear rebuilding all packages again? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@europaul i definitively will run it on eve tree after this PR is merged

@rene
Copy link
Contributor

rene commented Sep 23, 2025

@rucoder , it looks like the Linter is proven to work, so you can remove the fake-test commit in order to finalize the review and allow the PR to get merged...

@rucoder
Copy link
Contributor Author

rucoder commented Sep 23, 2025

@rucoder , it looks like the Linter is proven to work, so you can remove the fake-test commit in order to finalize the review and allow the PR to get merged...

@rene I'm not sure about one case: ARG MYARG= seems to be a valid syntax, I need to check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

side-quest A worthy adventure, but not essential for victory. Tackle when the main quest is safe!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants