Skip to content

Coding agent visual + runtime verification: screenshot + console errors + simulator/emulator testing #453

@joelteply

Description

@joelteply

Problem

The coding agent produced a Snake game that has runtime JavaScript errors and renders a blank canvas. Without verification, this becomes training data that teaches the model to produce broken code. Loss going down means NOTHING if the output doesn't work.

Current State

  • Code generated: looks correct syntactically
  • Runtime errors: Cannot read properties of undefined (reading 'set') line 345, Cannot read properties of undefined (reading 'length') line 147
  • Visual result: blank black canvas (no snake, no food, no overlay)
  • Training capture: captureTraining=true saved this as a POSITIVE example — it should be NEGATIVE

What's Needed

1. Browser Console Error Capture

After the coding agent opens a file in browser:

  • Capture console.error and uncaught exceptions via jtag
  • Inject a small error collector script before loading the page
  • Return errors as structured data in the tool result

2. Screenshot Verification

  • interface/screenshot of the rendered page
  • VisionDescriptionService describes what it sees
  • Compare description against the prompt requirements
  • "Blank canvas" ≠ "Snake game with score display"

3. Automated Grading

  • Runtime errors = automatic failure
  • Visual mismatch = failure (blank when should be game)
  • Passing = no errors + visual matches description
  • Failed examples become negative training data (or are discarded)

4. Simulator/Emulator Testing (future)

For iOS/Android apps:

  • iOS Simulator driven tests (xcrun simctl)
  • Android Emulator driven tests (adb)
  • Secure enclave testing for biometric features
  • Screenshot capture from simulator/emulator
  • Same visual verification pipeline

5. Fix-and-Retry Loop

When verification fails:

  • Feed errors back to the coding agent
  • "Your code has these errors: [errors]. The screenshot shows: [description]. Fix it."
  • Retry up to N times
  • Only successful, verified code becomes training data

The Rule

No blind training. Every code example must be:

  1. Executed (not just compiled)
  2. Visually verified (screenshot + description)
  3. Error-free (no console errors, no crashes)
  4. Functionally correct (output matches prompt)

Training on unverified code is training on garbage.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions