feat: add codex claude sdk #5

chenmancm169 · 2025-12-22T12:53:41Z

No description provided.

Copilot

Pull request overview

This PR adds two new agent implementations for code generation: CodexAgent and ClaudeCodeAgent. Both agents extend the BaseAgent class to provide AI-powered code generation capabilities through external CLI tools.

Key Changes:

Added CodexAgent that integrates with a Codex CLI tool for code generation using GPT models
Added ClaudeCodeAgent that integrates with Claude Agent SDK for code generation using Claude models
Both agents support configurable permissions, working directories, tool restrictions, and prompt customization

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 23 comments.

File	Description
autoenv/codex_agent.py	Implements CodexAgent with Codex CLI integration, including command building, async execution, JSONL parsing, and process management
autoenv/claude_code_agent.py	Implements ClaudeCodeAgent using Claude Agent SDK with graceful fallbacks, async streaming, and error handling for various SDK exceptions

Comments suppressed due to low confidence (7)

autoenv/codex_agent.py:332

Overly complex 'del' method.

    def __del__(self):

autoenv/claude_code_agent.py:4

Import of 'os' is not used.

import os

autoenv/claude_code_agent.py:11

Import of 'AsyncLLM' is not used.

from base.engine.async_llm import AsyncLLM

autoenv/claude_code_agent.py:15

Import of 'ClaudeSDKClient' is not used.

    from claude_agent_sdk import query, ClaudeAgentOptions, ClaudeSDKClient

autoenv/claude_code_agent.py:16

Import of 'AssistantMessage' is not used.
Import of 'UserMessage' is not used.
Import of 'ResultMessage' is not used.

    from claude_agent_sdk import AssistantMessage, UserMessage, ResultMessage

autoenv/claude_code_agent.py:17

Import of 'TextBlock' is not used.
Import of 'ToolUseBlock' is not used.
Import of 'ToolResultBlock' is not used.

    from claude_agent_sdk import TextBlock, ToolUseBlock, ToolResultBlock

autoenv/codex_agent.py:13

Import of 'AsyncLLM' is not used.

from base.engine.async_llm import AsyncLLM

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

autoenv/claude_code_agent.py

autoenv/codex_agent.py

autoenv/claude_code_agent.py

autoenv/codex_agent.py

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 15 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

autoenv/codex_agent.py

autoenv/claude_code_agent.py

autoenv/codex_agent.py

autoenv/claude_code_agent.py

autoenv/codex_agent.py

autoenv/claude_code_agent.py

autoenv/codex_agent.py

… error handling

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 36 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-29T14:35:51Z

autoenv/codex_agent.py

+    system_prompt_override: Optional[str] = Field(
+        default=None,
+        description="Override system prompt (only for non-interactive mode)"
+    )
+    append_system_prompt: Optional[str] = Field(
+        default=None,
+        description="Append to system prompt (only for non-interactive mode)"
+    )


The system_prompt_override and append_system_prompt fields are defined but never used in CodexAgent's implementation. These fields are listed in modifiable_attrs and mentioned in comments but there's no code that actually passes them to the Codex CLI. Either implement this functionality or remove these unused fields.

Suggested change

system_prompt_override: Optional[str] = Field(

default=None,

description="Override system prompt (only for non-interactive mode)"

)

append_system_prompt: Optional[str] = Field(

default=None,

description="Append to system prompt (only for non-interactive mode)"

)

Copilot · 2025-12-29T14:35:51Z

autoenv/codex_agent.py

+        if not output:
+            return {"result": "", "session_id": None}


The check for stdout being empty (line 195-196) returns a dictionary with session_id=None, but in other cases session_id is extracted from the JSONL output. This inconsistency could cause issues if calling code expects session_id to always be a string or None. Consider using a consistent approach for handling missing values.

Copilot · 2025-12-29T14:35:51Z

autoenv/codex_agent.py

+        except (TimeoutError, RuntimeError, ValueError, FileNotFoundError, NotADirectoryError) as e:
+            # Known CLI execution errors - return as error strings
+            return f"Error during execution: {str(e)}"
+        except Exception:
+            # Re-raise unexpected errors (programming errors) for proper debugging
+            raise


The error handling pattern catches specific exceptions (TimeoutError, RuntimeError, ValueError, FileNotFoundError, NotADirectoryError) and returns them as error strings, but then re-raises unexpected exceptions. However, FileNotFoundError and NotADirectoryError from _validate_cwd are expected validation errors that should be caught during initialization, not during execution. Consider whether these should be caught here or if they represent programming errors that should propagate.

Copilot · 2025-12-29T14:35:52Z

autoenv/claude_code_agent.py

+        # Safely modify attributes with validation
+        original_values = {}
+        for key, value in kwargs.items():
+            if key not in modifiable_attrs:
+                return f"Error: Attribute '{key}' cannot be modified via kwargs. Allowed: {sorted(modifiable_attrs)}"
+
+            if not hasattr(self, key):
+                return f"Error: Unknown attribute '{key}'"
+
+            try:
+                original_values[key] = getattr(self, key)
+                setattr(self, key, value)
+
+                # Validate critical attributes after modification
+                if key == 'cwd':
+                    if isinstance(self.cwd, str):
+                        self.cwd = Path(self.cwd)
+                    self._validate_cwd()
+                elif key == 'permission_mode':
+                    valid_modes = ["default", "acceptEdits", "bypassPermissions", "plan"]
+                    if value not in valid_modes:
+                        raise ValueError(f"Invalid permission_mode: {value}. Must be one of: {valid_modes}")
+
+            except Exception as e:
+                # If validation fails, restore any attributes set so far
+                for restore_key, restore_value in original_values.items():
+                    try:
+                        setattr(self, restore_key, restore_value)
+                    except Exception:
+                        pass  # Best effort restoration
+                return f"Error: Failed to set attribute '{key}': {str(e)}"
+
+        try:
+            options = self._create_options()
+            return await self._process_query_stream(request, options)
+        except Exception as e:
+            return self._handle_sdk_error(e)
+        finally:
+            # Restore original values (best effort)
+            for key, value in original_values.items():
+                try:
+                    setattr(self, key, value)
+                except Exception:
+                    pass  # Ignore errors during restoration


Similar to CodexAgent, the run method modifies instance attributes without synchronization. If multiple coroutines call run() concurrently on the same ClaudeCodeAgent instance, they could interfere with each other's attribute modifications and restoration. Document that the agent is not safe for concurrent use or implement proper locking.

Copilot · 2025-12-29T14:35:52Z

autoenv/codex_agent.py

+                return f"Error: Attribute '{key}' cannot be modified via kwargs. Allowed: {sorted(modifiable_attrs)}"
+
+            if not hasattr(self, key):
+                return f"Error: Unknown attribute '{key}'"


The error message for unknown attributes (line 327) is less helpful than it could be. It says "Unknown attribute" but the attribute exists on the class - it's just not in the modifiable_attrs whitelist. This will confuse users. The message should clarify that the attribute exists but cannot be modified via kwargs, similar to the message on line 324.

Suggested change

return f"Error: Unknown attribute '{key}'"

return (

f"Error: Attribute '{key}' does not exist on CodexAgent and cannot be "

f"modified via kwargs. Allowed: {sorted(modifiable_attrs)}"

)

Copilot · 2025-12-29T14:35:58Z

autoenv/codex_agent.py

+        # Safely modify attributes with validation
+        original_values = {}
+        for key, value in kwargs.items():
+            if key not in modifiable_attrs:
+                return f"Error: Attribute '{key}' cannot be modified via kwargs. Allowed: {sorted(modifiable_attrs)}"
+
+            if not hasattr(self, key):
+                return f"Error: Unknown attribute '{key}'"
+
+            try:
+                original_values[key] = getattr(self, key)
+                setattr(self, key, value)
+
+                # Validate critical attributes after modification
+                if key == 'cwd':
+                    # Convert to Path if string provided
+                    if isinstance(self.cwd, str):
+                        self.cwd = Path(self.cwd)
+                    self._validate_cwd()
+                elif key == 'permission_mode':
+                    valid_modes = ["default", "acceptEdits", "bypassPermissions", "plan"]
+                    if value not in valid_modes:
+                        raise ValueError(f"Invalid permission_mode: {value}. Must be one of: {valid_modes}")
+
+            except Exception as e:
+                # If validation fails, restore any attributes set so far
+                for restore_key, restore_value in original_values.items():
+                    try:
+                        setattr(self, restore_key, restore_value)
+                    except Exception:
+                        pass  # Best effort restoration
+                return f"Error: Failed to set attribute '{key}': {str(e)}"
+
+        try:
+            result = await self._run_cli_command(request)
+            return self._process_result(result)
+
+        except (TimeoutError, RuntimeError, ValueError, FileNotFoundError, NotADirectoryError) as e:
+            # Known CLI execution errors - return as error strings
+            return f"Error during execution: {str(e)}"
+        except Exception:
+            # Re-raise unexpected errors (programming errors) for proper debugging
+            raise
+
+        finally:
+            # Restore original values (best effort)
+            for key, value in original_values.items():
+                try:
+                    setattr(self, key, value)
+                except Exception:
+                    pass  # Ignore errors during restoration


The run method modifies instance attributes (cwd, permission_mode, etc.) via setattr without any locking mechanism. If multiple coroutines call run() concurrently on the same agent instance, they could interfere with each other's attribute modifications and restoration, leading to unpredictable behavior. Either document that the agent is not thread-safe and should not be used concurrently, or implement proper synchronization.

Copilot · 2025-12-29T14:35:58Z

autoenv/codex_agent.py

+    )
+    permission_mode: str = Field(
+        default="acceptEdits",
+        description="Permission mode: default|acceptEdits|bypassPermissions|plan"


The permission_mode field describes four valid modes (default|acceptEdits|bypassPermissions|plan) but doesn't explain what each mode does. Users need to understand the security and functional implications of each mode, especially the dangerous "bypassPermissions" mode. Add detailed documentation for each mode.

Suggested change

description="Permission mode: default|acceptEdits|bypassPermissions|plan"

description=(

"Controls how the agent requests and applies potentially dangerous actions or edits. "

"Valid values:\n"

"- 'default': Conservative behavior. The agent requests confirmation before executing "

" actions that may modify files, run commands, or change external state.\n"

"- 'acceptEdits': The agent may automatically apply code or file edits it suggests, "

" but still requires confirmation for high‑risk operations such as running arbitrary "

" shell commands or using powerful tools.\n"

"- 'bypassPermissions': Disables interactive permission checks and safety gating. The "

" agent is allowed to execute actions and apply edits directly, including running "

" commands and using tools without asking for approval. This mode is dangerous and "

" should only be used in tightly controlled, non‑production environments where you "

" fully trust the agent and the code it operates on.\n"

"- 'plan': The agent only analyzes and generates a plan of actions/edits without "

" actually executing commands or applying changes. Use this mode to review what the "

" agent would do before enabling a more permissive mode."

)

Copilot · 2025-12-29T14:35:59Z

autoenv/claude_code_agent.py

+    )
+    permission_mode: str = Field(
+        default="acceptEdits",
+        description="Permission mode: default|acceptEdits|bypassPermissions|plan"


Same issue as CodexAgent - the permission_mode field lists four modes but doesn't explain what each mode does. This is especially important given the security implications of "bypassPermissions". Add detailed documentation for each permission mode.

Suggested change

description="Permission mode: default|acceptEdits|bypassPermissions|plan"

description=(

"Controls how the agent handles permission prompts for edits, tool use, and commands. "

"Supported values:\n"

"- 'default': Use the Claude Code CLI / SDK's default interactive behavior for requesting\n"

" permission before making changes or running tools.\n"

"- 'acceptEdits': Automatically accept and apply code edits suggested by the agent while\n"

" still requiring confirmation for other sensitive actions (e.g., shell commands).\n"

"- 'bypassPermissions': Run tools, apply edits, and execute commands without asking for\n"

" interactive confirmation. This effectively disables permission prompts and should only\n"

" be used in fully trusted, sandboxed environments due to its security implications.\n"

"- 'plan': Focus on generating and updating a plan of actions without automatically\n"

" executing potentially destructive steps; execution typically requires separate approval."

)

Copilot · 2025-12-29T14:35:59Z

autoenv/claude_code_agent.py

+        """Execute the agent with given parameters."""
+        request = kwargs.pop('request', None) or kwargs.pop('task', None) or kwargs.pop('prompt', None)


Same issue as CodexAgent - the call method accepts three different parameter names (request, task, prompt) which can lead to ambiguity. If multiple are provided, the precedence is undocumented. Consider standardizing on a single parameter name or clearly documenting the precedence order.

Suggested change

"""Execute the agent with given parameters."""

request = kwargs.pop('request', None) or kwargs.pop('task', None) or kwargs.pop('prompt', None)

"""Execute the agent with given parameters.

Accepts a single text argument under one of the aliases: 'request', 'task', or 'prompt'.

If more than one of these is provided, a ValueError is raised to avoid ambiguity.

"""

request = kwargs.pop("request", None)

task = kwargs.pop("task", None)

prompt = kwargs.pop("prompt", None)

# Ensure we don't silently ignore conflicting inputs

provided_count = sum(v is not None for v in (request, task, prompt))

if provided_count > 1:

raise ValueError(

"ClaudeCodeAgent.__call__ accepts only one of 'request', 'task', or 'prompt'. "

f"Received: request={request!r}, task={task!r}, prompt={prompt!r}"

)

# Preserve existing precedence: request > task > prompt

if request is None:

request = task if task is not None else prompt

Copilot · 2025-12-29T14:35:59Z

autoenv/codex_agent.py

+            except Exception as e:
+                # If validation fails, restore any attributes set so far
+                for restore_key, restore_value in original_values.items():
+                    try:
+                        setattr(self, restore_key, restore_value)
+                    except Exception:
+                        pass  # Best effort restoration
+                return f"Error: Failed to set attribute '{key}': {str(e)}"


The error handling in the run method's kwargs modification loop (lines 344-351) restores attributes in a try-except that silently swallows exceptions with pass. If restoration fails for some attributes, the agent could be left in an inconsistent state. The comment says "Best effort restoration" but it would be better to log these failures or at least track which restorations failed.

didiforgithub · 2025-12-30T03:59:13Z

Add use case and config (how to add codex config, and how to generate an environment via codex instead of mini-swe)
Delete Claude Code if you don't verify it.

…laude) and enhance error handling

…onfiguration options, and documentation updates

chenman016 added 2 commits December 22, 2025 20:19

feat: add CodeX and Claude Code SDK

6858fb0

chore: remove useless comments

a30d304

Copilot AI review requested due to automatic review settings December 22, 2025 12:53

Copilot started reviewing on behalf of chenmancm169 December 22, 2025 12:54 View session

Copilot AI reviewed Dec 22, 2025

View reviewed changes

chenmancm169 changed the title ~~Feat/add codex claude sdk~~ feat: add codex claude sdk Dec 23, 2025

feat: enhance Claude and Codex agents with error handling and validation

3ad0d0f

chenmancm169 requested a review from Copilot December 29, 2025 11:13

Copilot started reviewing on behalf of chenmancm169 December 29, 2025 11:14 View session

Copilot AI reviewed Dec 29, 2025

View reviewed changes

feat: improve Codex and Claude agent with CLI availability checks and…

5305846

… error handling

chenmancm169 requested a review from Copilot December 29, 2025 14:26

Copilot started reviewing on behalf of chenmancm169 December 29, 2025 14:27 View session

Copilot AI reviewed Dec 29, 2025

View reviewed changes

chenmancm169 added 2 commits December 30, 2025 20:30

feat: add support for multiple code agent backends (miniswe, codex, c…

8ecfa54

…laude) and enhance error handling

feat: enhance Claude and Codex agents with improved error handling, c…

09d985c

…onfiguration options, and documentation updates

-                return f"Error: Unknown attribute '{key}'"
+                return (
+                    f"Error: Attribute '{key}' does not exist on CodexAgent and cannot be "
+                    f"modified via kwargs. Allowed: {sorted(modifiable_attrs)}"
+                )

-        description="Permission mode: default|acceptEdits|bypassPermissions|plan"
+        description=(
+            "Controls how the agent requests and applies potentially dangerous actions or edits. "
+            "Valid values:\n"
+            "- 'default': Conservative behavior. The agent requests confirmation before executing "
+            "  actions that may modify files, run commands, or change external state.\n"
+            "- 'acceptEdits': The agent may automatically apply code or file edits it suggests, "
+            "  but still requires confirmation for high‑risk operations such as running arbitrary "
+            "  shell commands or using powerful tools.\n"
+            "- 'bypassPermissions': Disables interactive permission checks and safety gating. The "
+            "  agent is allowed to execute actions and apply edits directly, including running "
+            "  commands and using tools without asking for approval. This mode is dangerous and "
+            "  should only be used in tightly controlled, non‑production environments where you "
+            "  fully trust the agent and the code it operates on.\n"
+            "- 'plan': The agent only analyzes and generates a plan of actions/edits without "
+            "  actually executing commands or applying changes. Use this mode to review what the "
+            "  agent would do before enabling a more permissive mode."
+        )

-        description="Permission mode: default|acceptEdits|bypassPermissions|plan"
+        description=(
+            "Controls how the agent handles permission prompts for edits, tool use, and commands. "
+            "Supported values:\n"
+            "- 'default': Use the Claude Code CLI / SDK's default interactive behavior for requesting\n"
+            "  permission before making changes or running tools.\n"
+            "- 'acceptEdits': Automatically accept and apply code edits suggested by the agent while\n"
+            "  still requiring confirmation for other sensitive actions (e.g., shell commands).\n"
+            "- 'bypassPermissions': Run tools, apply edits, and execute commands without asking for\n"
+            "  interactive confirmation. This effectively disables permission prompts and should only\n"
+            "  be used in fully trusted, sandboxed environments due to its security implications.\n"
+            "- 'plan': Focus on generating and updating a plan of actions without automatically\n"
+            "  executing potentially destructive steps; execution typically requires separate approval."
+        )

		"""Execute the agent with given parameters."""
		request = kwargs.pop('request', None) or kwargs.pop('task', None) or kwargs.pop('prompt', None)

-        """Execute the agent with given parameters."""
-        request = kwargs.pop('request', None) or kwargs.pop('task', None) or kwargs.pop('prompt', None)
+        """Execute the agent with given parameters.
+        Accepts a single text argument under one of the aliases: 'request', 'task', or 'prompt'.
+        If more than one of these is provided, a ValueError is raised to avoid ambiguity.
+        """
+        request = kwargs.pop("request", None)
+        task = kwargs.pop("task", None)
+        prompt = kwargs.pop("prompt", None)
+        # Ensure we don't silently ignore conflicting inputs
+        provided_count = sum(v is not None for v in (request, task, prompt))
+        if provided_count > 1:
+            raise ValueError(
+                "ClaudeCodeAgent.__call__ accepts only one of 'request', 'task', or 'prompt'. "
+                f"Received: request={request!r}, task={task!r}, prompt={prompt!r}"
+            )
+        # Preserve existing precedence: request > task > prompt
+        if request is None:
+            request = task if task is not None else prompt

feat: add codex claude sdk #5

Are you sure you want to change the base?

feat: add codex claude sdk #5

Uh oh!

Conversation

chenmancm169 commented Dec 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

didiforgithub commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants