-
Notifications
You must be signed in to change notification settings - Fork 2
Closed
Description
Problem
The coding agent produced a Snake game that has runtime JavaScript errors and renders a blank canvas. Without verification, this becomes training data that teaches the model to produce broken code. Loss going down means NOTHING if the output doesn't work.
Current State
- Code generated: looks correct syntactically
- Runtime errors:
Cannot read properties of undefined (reading 'set')line 345,Cannot read properties of undefined (reading 'length')line 147 - Visual result: blank black canvas (no snake, no food, no overlay)
- Training capture:
captureTraining=truesaved this as a POSITIVE example — it should be NEGATIVE
What's Needed
1. Browser Console Error Capture
After the coding agent opens a file in browser:
- Capture
console.errorand uncaught exceptions via jtag - Inject a small error collector script before loading the page
- Return errors as structured data in the tool result
2. Screenshot Verification
interface/screenshotof the rendered page- VisionDescriptionService describes what it sees
- Compare description against the prompt requirements
- "Blank canvas" ≠ "Snake game with score display"
3. Automated Grading
- Runtime errors = automatic failure
- Visual mismatch = failure (blank when should be game)
- Passing = no errors + visual matches description
- Failed examples become negative training data (or are discarded)
4. Simulator/Emulator Testing (future)
For iOS/Android apps:
- iOS Simulator driven tests (xcrun simctl)
- Android Emulator driven tests (adb)
- Secure enclave testing for biometric features
- Screenshot capture from simulator/emulator
- Same visual verification pipeline
5. Fix-and-Retry Loop
When verification fails:
- Feed errors back to the coding agent
- "Your code has these errors: [errors]. The screenshot shows: [description]. Fix it."
- Retry up to N times
- Only successful, verified code becomes training data
The Rule
No blind training. Every code example must be:
- Executed (not just compiled)
- Visually verified (screenshot + description)
- Error-free (no console errors, no crashes)
- Functionally correct (output matches prompt)
Training on unverified code is training on garbage.
Related
- Sensory system verification: vision, screenshots, live mode visual awareness #409 (sensory system — visual verification)
- Academy: no full training session proven end-to-end #377 (Academy e2e — needs verification in the loop)
- Support <think> tags: reasoning display in chat + typing→thinking verb #440 ( tags — model reasoning about its own code quality)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels