Skip to content

fix: resolve screenshot, Ollama image/audio analysis, and build failures#63

Open
dafandikri wants to merge 1 commit intoPrat011:masterfrom
dafandikri:fix/ollama-screenshots-build
Open

fix: resolve screenshot, Ollama image/audio analysis, and build failures#63
dafandikri wants to merge 1 commit intoPrat011:masterfrom
dafandikri:fix/ollama-screenshots-build

Conversation

@dafandikri
Copy link
Copy Markdown

Summary

Fixes 7 open issues by addressing 5 root causes across the screenshot pipeline, LLM integration, and build system.

Issues Resolved

Fix Issues Closed Root Cause
Screenshot ENOENT on macOS #53, #10 screenshot-desktop strips spaces from filenames — "Application Support" → "ApplicationSupport". Switched to app.getPath("temp")
Image analysis crash (Ollama) #41, #43 When USE_OLLAMA=true, this.model (Gemini) stays null. All vision methods called this.model.generateContent() without an Ollama fallback
Audio analysis crash #6 Same null model issue. Added explicit error since Ollama doesn't support audio natively
Build fails #35 Build script deleted dist-electron/ then never recompiled electron code. Added tsc -p electron/tsconfig.json to build
Ollama improvements #25 Added vision support via images param, robust JSON parsing, default to llava model

Changes

electron/ScreenshotHelper.ts

  • Use app.getPath("temp") instead of app.getPath("userData") to avoid spaces in path
  • Add { recursive: true } to mkdirSync for safety
  • Add file existence validation after capture with descriptive permission error

electron/LLMHelper.ts

  • Add Ollama fallback to all 6 methods that used this.model.generateContent():
    • extractProblemFromImages — uses Ollama vision API with images param
    • generateSolution — uses callOllama()
    • debugSolutionWithImages — uses Ollama vision API
    • analyzeImageFile — uses Ollama vision API
    • analyzeAudioFile / analyzeAudioFromBase64 — clear error (Ollama can't do audio)
  • Extended callOllama() to accept optional images parameter for vision models
  • Added parseJsonSafe() for robust JSON extraction from LLM responses
  • Updated default fallback model from gemma:latest to llava:latest (vision-capable)

package.json

  • Fixed build script: "build": "npm run clean && tsc -p electron/tsconfig.json && tsc && vite build"

Test plan

  • tsc -p electron/tsconfig.json — compiles with 0 errors
  • tsc --noEmit — frontend type-check passes
  • vite build — builds successfully, dist-electron/main.js exists
  • Screenshots save to temp directory (verified: files created at /var/folders/.../T/)
  • llava vision model correctly analyzes screenshots via Ollama API
  • screencapture verified working from CLI (4.9MB file) and Node.js (3.3MB file)

Notes for reviewers

  • Vision model required: Users with Ollama need a vision-capable model (llava, llama3.2-vision) for image analysis. llama3.2 (text-only) won't work.
  • Audio still requires Gemini: Ollama doesn't support audio analysis. Users get a clear error message directing them to configure GEMINI_API_KEY.
  • The screenshot-desktop library bug (space stripping in filenames) is upstream — using temp dir is the workaround.

🤖 Generated with Claude Code

Fixes multiple open issues:

1. Screenshot ENOENT on macOS (Prat011#53, Prat011#10, Prat011#43)
   - Root cause: screenshot-desktop sanitizes filenames by stripping
     non-alphanumeric chars, removing the space in "Application Support"
   - Fix: Use app.getPath("temp") instead of app.getPath("userData")
   - Added file existence validation after capture with clear error message

2. Image analysis crash in Ollama mode (Prat011#41, Prat011#43)
   - Root cause: When USE_OLLAMA=true, this.model (Gemini) stays null.
     Methods like analyzeImageFile, generateSolution, etc. called
     this.model.generateContent() without checking for Ollama mode
   - Fix: Added Ollama fallback paths to all 6 affected methods using
     the Ollama /api/generate endpoint with images support for vision

3. Audio analysis crash in Ollama mode (Prat011#6)
   - Root cause: Same null model issue as above
   - Fix: Added explicit error message since Ollama doesn't support
     audio analysis natively (requires Gemini API key)

4. Build fails with "dist-electron/main.js does not exist" (Prat011#35)
   - Root cause: Build script runs "npm run clean" (deletes dist-electron/)
     then "tsc" (root tsconfig with noEmit:true) which does nothing
   - Fix: Added "tsc -p electron/tsconfig.json" to build script

5. Improved Ollama integration (Prat011#25)
   - Added robust JSON parsing (parseJsonSafe) to handle malformed
     JSON responses from local models
   - Extended callOllama() to support images parameter for vision models
   - Updated default fallback model to llava (vision-capable)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant