Skip to content

Add Monty REPL environment#94

Open
CyrusNuevoDia wants to merge 7 commits intoalexzhang13:mainfrom
CyrusNuevoDia:codex/monty-backend
Open

Add Monty REPL environment#94
CyrusNuevoDia wants to merge 7 commits intoalexzhang13:mainfrom
CyrusNuevoDia:codex/monty-backend

Conversation

@CyrusNuevoDia
Copy link

@CyrusNuevoDia CyrusNuevoDia commented Feb 9, 2026

Why Monty?

Tech Language completeness Security Start latency Cost Setup complexity File mounting Snapshotting
Monty partial strict 0.06ms free easy easy easy
Docker full good 195ms free intermediate easy intermediate
Pyodide full poor 2800ms free intermediate easy hard
starlark-rust very limited good 1.7ms free easy not available? impossible?
sandboxing service full strict 1033ms not free intermediate hard intermediate
YOLO Python full non-existent 0.1ms / 30ms free easy easy / scary hard

Summary

  • Add Monty-backed REPL environment (non-isolated) with stdout/stderr capture
  • Persist variables across code blocks with AST-based name capture
  • Enable persistent=True for Monty by implementing SupportsPersistence
  • Improve FINAL_VAR/SHOW_VARS semantics within the same block
  • Register monty environment, update docs and optional deps
  • Add Monty tests and import guards

Testing

  • uv run pytest tests/test_monty_repl.py
  • uv run pytest tests/test_multi_turn_integration.py -k persistent
  • uv run pytest tests/test_imports.py

Haiku

Tiny sandbox hums
Code whispers in quiet loops
Monty guards the sparks

Copilot AI review requested due to automatic review settings February 9, 2026 02:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new optional Monty-backed REPL environment to the rlm.environments routing system, along with dependency wiring and basic tests/docs so users can select environment="monty".

Changes:

  • Introduce MontyREPL (non-isolated) with stdout capture and AST-based variable persistence across code blocks.
  • Register the new environment type (monty) in environment routing/types and add an optional dependency extra.
  • Add import-guarded tests for Monty availability and basic REPL behavior; update README installation/docs.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
rlm/environments/monty_repl.py New Monty-based REPL implementation with state persistence and LM query helpers
rlm/environments/__init__.py Registers "monty" in get_environment() routing
rlm/core/types.py Extends EnvironmentType Literal to include "monty"
tests/test_monty_repl.py Adds Monty REPL smoke tests (import-skipped if dependency missing)
tests/test_imports.py Adds Monty import checks and optional-module circular import coverage
pyproject.toml Adds monty optional extra dependency
uv.lock Locks pydantic-monty and adds monty extra metadata
README.md Documents MontyREPL and installation via optional extra

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CyrusNuevoDia and others added 2 commits February 8, 2026 20:01
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 9, 2026 03:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 9, 2026 03:16
@CyrusNuevoDia
Copy link
Author

Addressed review comments in latest push (README grammar, stderr capture, in-block FINAL_VAR/SHOW_VARS, AssignedNameCollector walrus/match, tests, persistence guard/docstring, monty persistent completion test). Happy to resolve threads if needed.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CyrusNuevoDia and others added 2 commits February 9, 2026 14:37
Fix cleanup() to clear stderr_parts and reset counters, guard state
restoration to avoid silently setting variables to None, remove dead
final_var()/show_vars() instance methods, and add execution-level
stderr capture test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
#	rlm/core/types.py
#	rlm/environments/__init__.py
Copilot AI review requested due to automatic review settings February 9, 2026 21:52
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@rawwerks
Copy link
Contributor

looks awesome, can't wait!

@lambdaofgod
Copy link

lambdaofgod commented Feb 13, 2026

That's great work but I have some bad news... I was looking into that a couple of days ago but it's unfortunately not that easy to just plug in Monty because of its limitations.

1. Functions and closures don't survive across turns

This affects any user code that defines a function or closure in one block and calls it in a later block. It essentially means Monty does not support a real REPL. At least yet without an awkward workaround. This means it's pretty hard to use it with RLM because it assumes using an actual REPL and not just calling independent snippets of code.

Monty's input boundary converts function objects to their string representation. A function defined in turn N is stored in self.locals as "<function 'helper' at 0x...>" (a string). On turn N+1 it's passed back into Monty as a string input, so calling it raises TypeError: object is not callable.

Tests (TestMontyREPLValueRoundTrip), all FAIL with MontyRuntimeError: TypeError: object is not callable:

class TestMontyREPLValueRoundTrip:
    """Tests that non-primitive values survive across execution turns."""

    def test_function_survival(self):
        """Define a function in block 1, call it in block 2."""
        repl = MontyREPL()
        repl.execute_code("def helper():\n    return 42")
        assert "helper" in repl.locals
        result = repl.execute_code("print(helper())")
        assert "42" in result.stdout  # FAILS: stdout is ''

    def test_closure_survival(self):
        """Define a closure in block 1, call it in block 2."""
        repl = MontyREPL()
        repl.execute_code(
            "def make_adder(n):\n    return lambda x: x + n\nadd5 = make_adder(5)"
        )
        assert "add5" in repl.locals
        result = repl.execute_code("print(add5(10))")
        assert "15" in result.stdout  # FAILS: stdout is ''

    def test_function_referencing_cross_turn_variable(self):
        """Variable from turn 1, function closing over it in turn 2, called in turn 3."""
        repl = MontyREPL()
        repl.execute_code("data = [1, 2, 3]")
        repl.execute_code("def total():\n    return sum(data)")
        result = repl.execute_code("print(total())")
        assert "6" in result.stdout  # FAILS: stdout is ''

Why is this problematic

LLMs naturally define helper functions in early turns and reuse them later. For example: turn 1 defines fetch_page(url) and fetches page 1, turn 2 calls fetch_page for page 2. The function is gone -- TypeError: object is not callable.

The naive workaround -- re-executing all previous code from scratch each turn (source replay) -- avoids the function problem but introduces a speed problem. Every previous network call is repeated on each new turn: if turn 1 fetched a URL, that fetch runs again in turn 2, and again in turn 3, and so on. Replay time grows linearly with the number of fetches across the session. For any code that interacts with external services, this quickly becomes impractical.

Source replay also introduces side-effect problems. File system operations are the worst case: if turn 1 writes a file and turn 2 appends to it, replaying both turns would write the file twice and then append, corrupting the result. Similarly, deleting and recreating directories, moving files, or incrementing counters stored on disk would all produce wrong results on replay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants