Lite #3

TheoMcCabe · 2025-07-30T17:11:25Z

Summary by Sourcery

Add interactive feature CLI and code linting, enhance Docker and benchmark workflows, refactor core components for more robust error handling and configurability

New Features:

Add an interactive feature-based CLI (gptf) with commands for creating, updating, and chatting about incremental development tasks
Integrate Python linting support using black into file selection and workflow with a toggleable linting option
Add '--yaml-output' and '--use-cache' options to the benchmark CLI to export results and enable response caching
Extend Prompt class to support a customizable prefix for LangChain messages

Bug Fixes:

Fallback to a base tiktoken encoding when encoding_for_model fails
Correct file path references in handle_improve_mode error logs
Fix benchmark CLI argument types and module import paths

Enhancements:

Revamp Docker setup: switch to python:3.11-slim, install dependencies, switch to bash entrypoint, and update README with CLI and Compose instructions and debugging tips
Refactor FileSelector to return linting status, inject linting sections into TOML, and open files with platform-specific editors
Improve error handling and fallback in Tokenizer and token usage cost calculation
Simplify improve mode flow by adding an additional_context parameter and refactoring handle_improve_mode
Enhance BenchConfig with examples_per_problem setting and add recursive to_dict conversion
Update main CLI to use options for model, integrate linting before improvement, and standardize help flags
Replace alpine base with slim in Dockerfile and update docker-compose.yml version syntax

CI:

Restrict CI triggers to relevant paths, introduce concurrency group, and pin tox to 4.15.0

Documentation:

Improve docker/README.md with prerequisites, setup steps, and Compose usage
Document Open Router model support in open_models.md

Tests:

Add tests for linting functions and file-selection conversion utilities
Update BenchConfig tests to reflect removed gpteng config and new fields
Add salvage_correct_hunks test for theo_case scenario

…tion

# Conflicts: # tests/core/default/test_steps.py

…nting-for-python-code Formate uploading files with black linting tool

Correct path for debug logs

Making black a main dependency

Small fixes to cli interface of gpte and bench applications

- Avoid installing packages without version - Define permissions for workflows with external actions - Avoid running CI related actions when no source code has changed - Stop running workflows when there is a newer commit in PR

Fix potential github action smells

…o installed packages feat(README.md): enhance Docker setup instructions and add debugging section fix(entrypoint.sh): change shebang from sh to bash

Hotfix/docker image

…ting-to-filestore extract linting process from file_selector

Preparation for our new release v0.3.1

Update pyproject.toml

sourcery-ai · 2025-07-30T17:12:57Z

Reviewer's Guide

This pull request overhauls the Docker setup and documentation, embeds linting support in the file selector workflow, enriches the CLI and benchmarking interfaces with new options and YAML exports, updates dependencies and entry points, fortifies core utilities (token usage, BenchConfig, prompt formatting, diff validation), refactors improve-mode error handling, integrates a Black-based linting utility, refines CI workflows, introduces a comprehensive feature-CLI module, and aligns the test suite with these enhancements.

Sequence diagram for file selection and linting in improve mode

sequenceDiagram
    actor User
    participant CLI as CLI
    participant FileSelector
    participant FileStore
    participant Linting
    User->>CLI: Start improve mode
    CLI->>FileSelector: ask_for_files()
    FileSelector-->>CLI: (FilesDict, is_linting)
    CLI->>FileStore: linting(FilesDict) (if is_linting)
    FileStore->>Linting: lint_files(FilesDict)
    Linting-->>FileStore: FilesDict (linted)
    FileStore-->>CLI: FilesDict (linted)
    CLI->>...: Continue with improved files

Class diagram for new Feature CLI module

classDiagram
    class Feature {
        +__init__(project_path, repository)
        +clear_feature()
        +clear_task()
        +get_description() str
        +set_description(feature_description)
        +has_description() bool
        +get_progress() dict
        +update_progress(task)
        +set_task(task)
        +get_task() str
        +has_task() bool
        +complete_task()
        +open_feature_in_editor()
        +open_task_in_editor()
    }
    class Task {
        +__init__(project_path)
        +delete()
        +set_task(task)
        +get_task() str
        +open_task_in_editor()
    }
    class FileSelection {
        +included_files: List[str]
        +excluded_files: List[str]
    }
    class Repository {
        +__init__(repo_path)
        +get_tracked_files() List[str]
        +find_most_recent_merge_base()
        +get_feature_branch_diff()
        +get_unstaged_changes()
        +get_git_context()
        +create_branch(branch_name)
        +stage_all_changes()
        +undo_unstaged_changes()
    }
    class GitContext {
        +commits: List[Commit]
        +branch_changes: str
        +staged_changes: str
        +unstaged_changes: str
        +tracked_files: List[str]
    }
    class Commit {
        +description: str
        +diff: str
    }
    class FileSelector {
        +__init__(project_path, repository, name)
        +set_to_yaml(file_selection)
        +update_yaml_from_tracked_files()
        +get_from_yaml() FileSelection
        +get_pretty_selected_from_yaml() str
        +open_yaml_in_editor()
        +get_included_as_file_repository()
    }
    class Files {
        +__init__(project_path, selected_files)
        +write_to_disk(files)
    }
    Feature --> Repository
    Feature --> FileSelection
    Task --> FileSelection
    Repository --> GitContext
    GitContext --> Commit
    FileSelector --> FileSelection
    Files --> FileSelection

Class diagram for new Linting utility

classDiagram
    class Linting {
        +__init__()
        +lint_python(content, config)
        +lint_files(files_dict, config) FilesDict
    }
    Linting ..> FilesDict : uses

Class diagram for updated BenchConfig and benchmark types

classDiagram
    class BenchConfig {
        +apps: AppsConfig
        +mbpp: MbppConfig
        +gptme: GptmeConfig
        +from_toml(config_file)
        +from_dict(config_dict)
        +recursive_resolve(data_dict)
        +to_dict()
    }
    class AppsConfig {
        +examples_per_problem: int
    }
    class TaskResult {
        +success_rate() float
        +to_dict() dict
    }
    BenchConfig --> AppsConfig
    BenchConfig --> GptmeConfig
    BenchConfig --> MbppConfig

Class diagram for Prompt class update

classDiagram
    class Prompt {
        +__init__(text, image_urls, entrypoint_prompt, prefix)
        +to_langchain_content() Dict[str, str]
    }

File-Level Changes

Change	Details	Files
Docker environment and documentation revamp	Expand README with prerequisites, Docker CLI & Compose setup, and debugging instructions Switch base images from alpine to slim and update package installation steps Change entrypoint to bash and trim version in docker-compose.yml	`docker/README.md` `docker/Dockerfile` `docker-compose.yml` `docker/entrypoint.sh`
FileSelector linting integration and editor updates	Add linting toggle lines and track is_linting flag in TOML templates Change ask_for_files to return lint status alongside file dict Simplify default editor invocation using platform modules	`gpt_engineer/applications/cli/file_selector.py`
CLI and benchmarking interface enhancements	Introduce --yaml_output and --use_cache options and export_yaml_results function Switch to langchain_community.SQLiteCache and conditionally set cache Append sys.path for dynamic agent imports and set custom typer help options	`gpt_engineer/benchmark/__main__.py` `gpt_engineer/benchmark/run.py` `gpt_engineer/applications/cli/main.py`
Dependency and entry point updates	Bump project version to 0.3.1 and add black, gitpython, langchain-community deps Define new gptf entry point for feature CLI Reformat extras and remove duplicate black entry	`pyproject.toml`
Core utility resilience and extensions	Wrap tokenizer init and usage_cost in try/except for fallbacks Add examples_per_problem field and to_dict()/recursive_resolve in BenchConfig Introduce Prompt.prefix and update to_langchain_content signature Enhance diff.validate output with detailed problem prints	`gpt_engineer/core/token_usage.py` `gpt_engineer/benchmark/bench_config.py` `gpt_engineer/core/prompt.py` `gpt_engineer/core/diff.py`
Improve-mode API and error logging refactor	Extend improve_fn with additional_context parameter Refactor handle_improve_mode signature to accept lambda and adjust debug log path Remove redundant console print of captured output	`gpt_engineer/core/default/steps.py`
Black-based linting utility and FileStore integration	Add Linting class wrapping black.format_str and lint_files method Integrate linting call into FileStore.linting Invoke linting before improve-mode in CLI	`gpt_engineer/core/linting.py` `gpt_engineer/core/default/file_store.py` `gpt_engineer/applications/cli/main.py`
CI workflow constraints and concurrency	Limit CI runs to changes under gpt_engineer/ and tests/ Add concurrency group with cancel-in-progress and pin tox to 4.15.0	`.github/workflows/ci.yaml`
New feature-CLI module	Introduce gptf commands with Feature, Repository, FileSelection, Agents and support files Implement task, chat, and feature flows with prompt_toolkit and yaspin Autogenerate YAML file selectors and Git context integration	`gpt_engineer/applications/feature_cli/` `tests/applications/feature_cli/`
Test suite alignment	Replace nested exception test with Python lint tests Add salvage_correct_hunks regression case Update BenchConfig tests to remove deprecated gpteng assertions Add unit tests for fuzzy file selection and paths-to-tree conversions	`tests/core/default/test_steps.py` `tests/core/test_salvage_correct_hunks.py` `tests/benchmark/test_BenchConfig.py` `tests/applications/feature_cli/test_file_selection.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @TheoMcCabe - I've reviewed your changes and they look great!

Blocking issues:

By not specifying a USER, a program in the container may run as 'root'. This is a security hazard. If an attacker can control a process running as root, they may have control over the container. Ensure that the last USER in a Dockerfile is a USER other than 'root'. (link)
Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
Detected subprocess function 'run' with user controlled data. A malicious actor could leverage this to perform command injection. You may consider using 'shlex.quote()'. (link)
Found dynamic content when spawning a process. This is dangerous if external data can reach this function call because it allows a malicious actor to execute commands. Ensure no external data reaches here. (link)
Found dynamic content when spawning a process. This is dangerous if external data can reach this function call because it allows a malicious actor to execute commands. Ensure no external data reaches here. (link)
Found dynamic content when spawning a process. This is dangerous if external data can reach this function call because it allows a malicious actor to execute commands. Ensure no external data reaches here. (link)
The Python documentation recommends using defusedxml instead of xml because the native Python xml library is vulnerable to XML External Entity (XXE) attacks. These attacks can leak confidential data and "XML bombs" can cause denial of service. (link)
Found dynamic content when spawning a process. This is dangerous if external data can reach this function call because it allows a malicious actor to execute commands. Ensure no external data reaches here. (link)

Prompt for AI Agents

Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `gpt_engineer/applications/cli/main.py:456` </location>
<code_context>
     files = FileStore(project_path)
     if not no_execution:
         if improve_mode:
-            files_dict_before = FileSelector(project_path).ask_for_files()
+            # lint the code
+            if is_linting:
</code_context>

<issue_to_address>
Potential undefined variable: is_linting is used before assignment.

`is_linting` is not defined in this scope, which will cause a NameError. Please initialize it before use.
</issue_to_address>

### Comment 2
<location> `gpt_engineer/core/linting.py:51` </location>
<code_context>
+            config = {}
+
+        for filename, content in files_dict.items():
+            extension = filename[
+                filename.rfind(".") :
+            ].lower()  # Ensure case insensitivity
+            if extension in self.linters:
+                original_content = content
</code_context>

<issue_to_address>
Extension extraction may fail for files without an extension.

If a filename lacks a dot, this logic returns the whole filename as the extension, which may not match any linter. Please add handling for files without extensions.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
        for filename, content in files_dict.items():
            extension = filename[
                filename.rfind(".") :
            ].lower()  # Ensure case insensitivity
            if extension in self.linters:
                original_content = content
=======
        for filename, content in files_dict.items():
            if "." in filename:
                extension = filename[filename.rfind("."):].lower()  # Ensure case insensitivity
            else:
                extension = ""
            if extension in self.linters:
                original_content = content
>>>>>>> REPLACE

</suggested_fix>

### Comment 3
<location> `gpt_engineer/benchmark/bench_config.py:54` </location>
<code_context>
-            gpteng=GptengConfig(**config_dict.get("gpteng", {})),
         )
+
+    @staticmethod
+    def recursive_resolve(data_dict):
+        for key, value in data_dict.items():
+            if isinstance(value, Integer):
+                data_dict[key] = int(value)
</code_context>

<issue_to_address>
Use of Integer type in recursive_resolve may cause NameError.

`Integer` is not defined or imported, which will cause a NameError. Use `int` or import the appropriate type.
</issue_to_address>

### Comment 4
<location> `tests/applications/feature_cli/test_file_selection.py:176` </location>
<code_context>
+
+
[email protected](reason="Skipping as test requires AI")
+def test_yaml_to_file_selection_fuzzy():
+
+    load_dotenv()
+
+    commented_yaml = """# gpt_engineer:
+#   applications:
+#     cli:
+      - __init__.py
+      - cli_agent.py
+#       - collect.py
+      - file_selector.py
+      - learning.py
+      - main.py"""
+
+    file_selction = fuzzy_parse_file_selection(AI(), commented_yaml)
+
+    assert file_selction == FileSelection(
+        [
+            "gpt_engineer/applications/cli/__init__.py",
</code_context>

<issue_to_address>
Test for fuzzy YAML parsing is skipped and requires AI.

Consider using a mock or fixture for the AI dependency so this test can run in CI, or clearly document the requirements for running it locally.

Suggested implementation:

```python
def test_yaml_to_file_selection_fuzzy(mocker):
    """
    Test fuzzy_parse_file_selection with a mock AI dependency.
    """

    load_dotenv()

    commented_yaml = """# gpt_engineer:
#   applications:
#     cli:
      - __init__.py
      - cli_agent.py
#       - collect.py
      - file_selector.py
      - learning.py
      - main.py"""

    # Mock the AI dependency's behavior
    mock_ai = mocker.Mock()
    # Adjust the return value to match what fuzzy_parse_file_selection expects
    mock_ai.some_method.return_value = FileSelection(
        [
            "gpt_engineer/applications/cli/__init__.py",
            "gpt_engineer/applications/cli/cli_agent.py",
            "gpt_engineer/applications/cli/file_selector.py",
            "gpt_engineer/applications/cli/learning.py",
            "gpt_engineer/applications/cli/main.py",
        ],
        [
            "gpt_engineer/applications/cli/collect.py",
        ],
    )

    # Patch fuzzy_parse_file_selection to use the mock AI
    # If fuzzy_parse_file_selection calls a method on AI, ensure the mock matches
    # If it just passes through, you may need to adjust this accordingly
    file_selction = fuzzy_parse_file_selection(mock_ai, commented_yaml)

    assert file_selction == FileSelection(
        [
            "gpt_engineer/applications/cli/__init__.py",
            "gpt_engineer/applications/cli/cli_agent.py",
            "gpt_engineer/applications/cli/file_selector.py",
            "gpt_engineer/applications/cli/learning.py",
            "gpt_engineer/applications/cli/main.py",
        ],
        [
            "gpt_engineer/applications/cli/collect.py",
        ],
    )

```

- This change assumes you are using pytest and pytest-mock (which provides the `mocker` fixture). If not, you may need to import `unittest.mock` and patch manually.
- Adjust `mock_ai.some_method.return_value` to match the actual method called by `fuzzy_parse_file_selection` on the AI object. If the function expects a different interface, update the mock accordingly.
- If `fuzzy_parse_file_selection` expects a specific method or attribute on the AI object, ensure the mock provides it.
</issue_to_address>

### Comment 5
<location> `tests/applications/feature_cli/test_file_selection.py:81` </location>
<code_context>
+def test_file_selection_to_yaml():
</code_context>

<issue_to_address>
Test for file_selection_to_commented_yaml uses a hardcoded expected string.

Comparing parsed YAML structures instead of raw strings will make the test less brittle to formatting changes.

Suggested implementation:

```python
def test_file_selection_to_yaml():
    import yaml

    included_files = [
        "docker/Dockerfile",
        "docker/README.md",
        "docker/entrypoint.sh",
    ]

    excluded_files = [
        ".github/ISSUE_TEMPLATE/bug-report.md",
        ".github/ISSUE_TEMPLATE/documentation-clarification.md",
        ".github/ISSUE_TEMPLATE/feature-request.md",

```

```python
    ]
    # ... (rest of the test setup)

    yaml_str = file_selection_to_commented_yaml(included_files, excluded_files)

    # Parse the YAML output and compare the resulting structure
    parsed_yaml = yaml.safe_load(yaml_str)
    expected_structure = {
        "docker": {
            "Dockerfile": "included",
            "README.md": "included",
            "entrypoint.sh": "included",
        },
        ".github": {
            "ISSUE_TEMPLATE": {
                "bug-report.md": "excluded",
                "documentation-clarification.md": "excluded",
                "feature-request.md": "excluded",
            }
        }
    }

    assert parsed_yaml == expected_structure

```
</issue_to_address>

### Comment 6
<location> `tests/core/test_salvage_correct_hunks.py:83` </location>
<code_context>
     salvage_correct_hunks(message_builder("create_two_new_files_chat"), files, memory)


+def test_theo_case():
+    files = FilesDict({"dockerfile": get_file_content("theo_case_code")})
+    updated_files, _ = salvage_correct_hunks(
+        message_builder("theo_case_chat"), files, memory
+    )
+    print(updated_files["dockerfile"])
+    print(updated_files["run.py"])
+
+
</code_context>

<issue_to_address>
New test 'test_theo_case' added for salvage_correct_hunks.

Please add assertions to verify that the outputs are correct, rather than just printing them.
</issue_to_address>

### Comment 7
<location> `docs/open_models.md:120` </location>
<code_context>
+Using Open Router models
+==================
+
+In case you don't posses the hardware to run local LLM's yourself you can use the hosting on [Open Router](https://openrouter.ai) and pay as you go for the tokens.
+
+To set it up you need to Sign In and load purchase 💰 the LLM credits. Pricing per token is different for (each model](https://openrouter.ai/models), but mostly cheaper then Open AI.
</code_context>

<issue_to_address>
Typo: 'posses' should be 'possess'.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
In case you don't posses the hardware to run local LLM's yourself you can use the hosting on [Open Router](https://openrouter.ai) and pay as you go for the tokens.
=======
In case you don't possess the hardware to run local LLM's yourself you can use the hosting on [Open Router](https://openrouter.ai) and pay as you go for the tokens.
>>>>>>> REPLACE

</suggested_fix>

## Security Issues

### Issue 1
<location> `docker/Dockerfile:29` </location>

<issue_to_address>
**security (dockerfile.security.missing-user-entrypoint):** By not specifying a USER, a program in the container may run as 'root'. This is a security hazard. If an attacker can control a process running as root, they may have control over the container. Ensure that the last USER in a Dockerfile is a USER other than 'root'.

```suggestion
USER non-root
ENTRYPOINT ["bash", "/app/entrypoint.sh"]
```

*Source: opengrep*
</issue_to_address>

### Issue 2
<location> `gpt_engineer/applications/cli/file_selector.py:227` </location>

<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

### Issue 3
<location> `gpt_engineer/applications/cli/file_selector.py:227` </location>

<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-tainted-env-args):** Detected subprocess function 'run' with user controlled data. A malicious actor could leverage this to perform command injection. You may consider using 'shlex.quote()'.

*Source: opengrep*
</issue_to_address>

### Issue 4
<location> `gpt_engineer/applications/cli/file_selector.py:234` </location>

<issue_to_address>
**security (python.lang.security.audit.dangerous-spawn-process-audit):** Found dynamic content when spawning a process. This is dangerous if external data can reach this function call because it allows a malicious actor to execute commands. Ensure no external data reaches here.

*Source: opengrep*
</issue_to_address>

### Issue 5
<location> `gpt_engineer/applications/feature_cli/feature.py:178` </location>

<issue_to_address>
**security (python.lang.security.audit.dangerous-spawn-process-audit):** Found dynamic content when spawning a process. This is dangerous if external data can reach this function call because it allows a malicious actor to execute commands. Ensure no external data reaches here.

*Source: opengrep*
</issue_to_address>

### Issue 6
<location> `gpt_engineer/applications/feature_cli/file_selection.py:313` </location>

<issue_to_address>
**security (python.lang.security.audit.dangerous-spawn-process-audit):** Found dynamic content when spawning a process. This is dangerous if external data can reach this function call because it allows a malicious actor to execute commands. Ensure no external data reaches here.

*Source: opengrep*
</issue_to_address>

### Issue 7
<location> `gpt_engineer/applications/feature_cli/generation_tools.py:1` </location>

<issue_to_address>
**security (python.lang.security.use-defused-xml):** The Python documentation recommends using `defusedxml` instead of `xml` because the native Python `xml` library is vulnerable to XML External Entity (XXE) attacks. These attacks can leak confidential data and "XML bombs" can cause denial of service.

*Source: opengrep*
</issue_to_address>

### Issue 8
<location> `gpt_engineer/applications/feature_cli/task.py:65` </location>

<issue_to_address>
**security (python.lang.security.audit.dangerous-spawn-process-audit):** Found dynamic content when spawning a process. This is dangerous if external data can reach this function call because it allows a malicious actor to execute commands. Ensure no external data reaches here.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-07-30T17:13:21Z

gpt_engineer/applications/cli/main.py

    files = FileStore(project_path)
    if not no_execution:
        if improve_mode:
-            files_dict_before = FileSelector(project_path).ask_for_files()


issue (bug_risk): Potential undefined variable: is_linting is used before assignment.

is_linting is not defined in this scope, which will cause a NameError. Please initialize it before use.

sourcery-ai · 2025-07-30T17:13:21Z

gpt_engineer/core/linting.py

+        for filename, content in files_dict.items():
+            extension = filename[
+                filename.rfind(".") :
+            ].lower()  # Ensure case insensitivity
+            if extension in self.linters:
+                original_content = content


suggestion: Extension extraction may fail for files without an extension.

If a filename lacks a dot, this logic returns the whole filename as the extension, which may not match any linter. Please add handling for files without extensions.

Suggested change

for filename, content in files_dict.items():

extension = filename[

filename.rfind(".") :

].lower() # Ensure case insensitivity

if extension in self.linters:

original_content = content

for filename, content in files_dict.items():

if "." in filename:

extension = filename[filename.rfind("."):].lower() # Ensure case insensitivity

else:

extension = ""

if extension in self.linters:

original_content = content

sourcery-ai · 2025-07-30T17:13:21Z

gpt_engineer/benchmark/bench_config.py

+    @staticmethod
+    def recursive_resolve(data_dict):
+        for key, value in data_dict.items():


issue (bug_risk): Use of Integer type in recursive_resolve may cause NameError.

Integer is not defined or imported, which will cause a NameError. Use int or import the appropriate type.

sourcery-ai · 2025-07-30T17:13:21Z

tests/applications/feature_cli/test_file_selection.py

+def test_yaml_to_file_selection_fuzzy():
+
+    load_dotenv()
+
+    commented_yaml = """# gpt_engineer:
+#   applications:
+#     cli:
+      - __init__.py
+      - cli_agent.py
+#       - collect.py


suggestion (testing): Test for fuzzy YAML parsing is skipped and requires AI.

Consider using a mock or fixture for the AI dependency so this test can run in CI, or clearly document the requirements for running it locally.

Suggested implementation:

def test_yaml_to_file_selection_fuzzy(mocker): """ Test fuzzy_parse_file_selection with a mock AI dependency. """ load_dotenv() commented_yaml = """# gpt_engineer: # applications: # cli: - __init__.py - cli_agent.py # - collect.py - file_selector.py - learning.py - main.py""" # Mock the AI dependency's behavior mock_ai = mocker.Mock() # Adjust the return value to match what fuzzy_parse_file_selection expects mock_ai.some_method.return_value = FileSelection( [ "gpt_engineer/applications/cli/__init__.py", "gpt_engineer/applications/cli/cli_agent.py", "gpt_engineer/applications/cli/file_selector.py", "gpt_engineer/applications/cli/learning.py", "gpt_engineer/applications/cli/main.py", ], [ "gpt_engineer/applications/cli/collect.py", ], ) # Patch fuzzy_parse_file_selection to use the mock AI # If fuzzy_parse_file_selection calls a method on AI, ensure the mock matches # If it just passes through, you may need to adjust this accordingly file_selction = fuzzy_parse_file_selection(mock_ai, commented_yaml) assert file_selction == FileSelection( [ "gpt_engineer/applications/cli/__init__.py", "gpt_engineer/applications/cli/cli_agent.py", "gpt_engineer/applications/cli/file_selector.py", "gpt_engineer/applications/cli/learning.py", "gpt_engineer/applications/cli/main.py", ], [ "gpt_engineer/applications/cli/collect.py", ], )

This change assumes you are using pytest and pytest-mock (which provides the mocker fixture). If not, you may need to import unittest.mock and patch manually.

Adjust mock_ai.some_method.return_value to match the actual method called by fuzzy_parse_file_selection on the AI object. If the function expects a different interface, update the mock accordingly.

If fuzzy_parse_file_selection expects a specific method or attribute on the AI object, ensure the mock provides it.

sourcery-ai · 2025-07-30T17:13:21Z

tests/applications/feature_cli/test_file_selection.py

+    paths_to_tree,
+    file_selection_to_commented_yaml,
+    commented_yaml_to_file_selection,
+)


suggestion (testing): Test for file_selection_to_commented_yaml uses a hardcoded expected string.

Comparing parsed YAML structures instead of raw strings will make the test less brittle to formatting changes.

Suggested implementation:

def test_file_selection_to_yaml(): import yaml included_files = [ "docker/Dockerfile", "docker/README.md", "docker/entrypoint.sh", ] excluded_files = [ ".github/ISSUE_TEMPLATE/bug-report.md", ".github/ISSUE_TEMPLATE/documentation-clarification.md", ".github/ISSUE_TEMPLATE/feature-request.md",

] # ... (rest of the test setup) yaml_str = file_selection_to_commented_yaml(included_files, excluded_files) # Parse the YAML output and compare the resulting structure parsed_yaml = yaml.safe_load(yaml_str) expected_structure = { "docker": { "Dockerfile": "included", "README.md": "included", "entrypoint.sh": "included", }, ".github": { "ISSUE_TEMPLATE": { "bug-report.md": "excluded", "documentation-clarification.md": "excluded", "feature-request.md": "excluded", } } } assert parsed_yaml == expected_structure

sourcery-ai · 2025-07-30T17:13:23Z

gpt_engineer/applications/feature_cli/file_selection.py

+    included_files = tree_to_paths(yaml.safe_load(commented_content))
+    try:
+        all_files = tree_to_paths(yaml.safe_load(uncommented_content_1))
+    except:


suggestion (code-quality): Use except Exception: rather than bare except: (do-not-use-bare-except)

Suggested change

except:

except Exception:

sourcery-ai · 2025-07-30T17:13:23Z

gpt_engineer/applications/feature_cli/file_selection.py

+                if stripped_line.startswith("- ") and (last_key != "(./):"):
+                    # add 2 spaces at the begining of line or after any #
+
+                    new_lines.append("  " + line)  # Add extra indentation


suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)

Suggested change

new_lines.append(" " + line) # Add extra indentation

new_lines.append(f" {line}")

sourcery-ai · 2025-07-30T17:13:23Z

gpt_engineer/applications/feature_cli/file_selection.py

+
+        try:
+            file_selection = commented_yaml_to_file_selection(yaml_content)
+        except:


suggestion (code-quality): Use except Exception: rather than bare except: (do-not-use-bare-except)

Suggested change

except:

except Exception:

sourcery-ai · 2025-07-30T17:13:23Z

gpt_engineer/applications/feature_cli/file_selection.py

+                    lines.append(prefix + "└── " + key)
+                    extension = "    "
+                else:
+                    lines.append(prefix + "├── " + key)


issue (code-quality): Use f-string instead of string concatenation [×4] (use-fstring-for-concatenation)

sourcery-ai · 2025-07-30T17:13:24Z

gpt_engineer/applications/feature_cli/generation_tools.py

+    # Create an instance of the response class
+    response = TaskResponse(planning_thoughts, tasks, closing_remarks)
+
+    return response


suggestion (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable)

Suggested change

# Create an instance of the response class

response = TaskResponse(planning_thoughts, tasks, closing_remarks)

return response

return TaskResponse(planning_thoughts, tasks, closing_remarks)

similato87 and others added 30 commits March 31, 2024 23:25

formate uploading files with black linting tool; add tests

53aa0c2

Merge branch 'main' into 1023-ruff-linting-for-python-code

44aff7c

fix test

446b16d

create linting class; add linting to file_Store

b724717

ensure we have latest gpt-4-turbo model costs

bec7cff

hmm

cf689e7

add linnting class

06faeb7

add interface for users to switch linting status; memorize linting op…

cd3c156

…tion

remove old tests

33de42e

update old test for linting

fabf9eb

add test files

1bb4887

Adding instructions for use of open router

1356f9d

Correct path

5c43542

optimize prompt and output for linting

a8328ed

add basic tests for linting

d23545b

Merge branch 'main' into 1023-ruff-linting-for-python-code

673a72a

# Conflicts: # tests/core/default/test_steps.py

get started

d5d6104

Merge branch 'main' into lite

9d02097

WIP

34b2f21

Merge branch 'lite' of https://github.com/mynt-ai/gpt-engineer into lite

e9d00e4

update testcase; add clear invalid diff output

9a7e8ef

Merge branch 'main' into 1128-applying-diffs-failing-silently

b2ef8c1

update tests

46cfacd

guess so

6e3bbdc

Merge pull request AntonOsika#1094 from gpt-engineer-org/1023-ruff-li…

b559317

…nting-for-python-code Formate uploading files with black linting tool

wip

4c7a9b9

ye

ca15c7a

Merge pull request AntonOsika#1141 from gpt-engineer-org/ao/logpath

78df4ba

Correct path for debug logs

Making black a main dependency

859af69

Merge pull request AntonOsika#1145 from ATheorell/main

5a5b785

Making black a main dependency

ATheorell and others added 27 commits May 22, 2024 19:38

"--model" now option and not argument

15b559e

Merge pull request AntonOsika#1157 from ATheorell/main

6698656

Small fixes to cli interface of gpte and bench applications

chat working

6525194

cool

2704496

wip

802ec8a

Fix gha smells:

2c5c1a8

- Avoid installing packages without version - Define permissions for workflows with external actions - Avoid running CI related actions when no source code has changed - Stop running workflows when there is a newer commit in PR

Merge pull request AntonOsika#1156 from ceddy4395/fix-gha-smells

5f9d7ac

Fix potential github action smells

feat(Dockerfile): switch from alpine to slim base image and add git t…

0f9d39c

…o installed packages feat(README.md): enhance Docker setup instructions and add debugging section fix(entrypoint.sh): change shebang from sh to bash

fix(docker-compose.yml): remove version specification (deprecated)

a927fb6

feat(pyproject.toml): add langchain-community dependency

1558080

build(deps): update poetry.lock

e54f070

roughly working

2142be4

WIP

7ffc87e

Merge pull request AntonOsika#1161 from k1lgor/hotfix/docker-image

1e292f4

Hotfix/docker image

Merge pull request AntonOsika#1146 from gpt-engineer-org/refactor-lin…

ff690ee

…ting-to-filestore extract linting process from file_selector

Update pyproject.toml

ed018a0

Preparation for our new release v0.3.1

Merge pull request AntonOsika#1168 from gpt-engineer-org/viborc-patch-1

3e589bf

Update pyproject.toml

working nicely

319990a

delete old one

320588c

feature cli

e2da9c7

hmmm

d93ca36

Merge branch 'main' into lite

ddb5a1b

chat agent working

59e4943

Merge branch 'lite' of https://github.com/mynt-ai/gpt-engineer into lite

c216e6d

getting there

8f01907

wip

1b15bc6

wip

89e4df9

Repository owner deleted a comment from sourcery-ai bot Jul 30, 2025

sourcery-ai bot requested changes Jul 30, 2025

View reviewed changes

	new_lines.append(" " + line) # Add extra indentation
	new_lines.append(f" {line}")

Lite #3

Are you sure you want to change the base?

Lite #3

Uh oh!

Conversation

TheoMcCabe commented Jul 30, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jul 30, 2025

Reviewer's Guide

Sequence diagram for file selection and linting in improve mode

Class diagram for new Feature CLI module

Class diagram for new Linting utility

Class diagram for updated BenchConfig and benchmark types

Class diagram for Prompt class update

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

TheoMcCabe commented Jul 30, 2025 •

edited by sourcery-ai bot

Loading