fix: resolve screenshot, Ollama image/audio analysis, and build failures#63
Open
dafandikri wants to merge 1 commit intoPrat011:masterfrom
Open
fix: resolve screenshot, Ollama image/audio analysis, and build failures#63dafandikri wants to merge 1 commit intoPrat011:masterfrom
dafandikri wants to merge 1 commit intoPrat011:masterfrom
Conversation
Fixes multiple open issues: 1. Screenshot ENOENT on macOS (Prat011#53, Prat011#10, Prat011#43) - Root cause: screenshot-desktop sanitizes filenames by stripping non-alphanumeric chars, removing the space in "Application Support" - Fix: Use app.getPath("temp") instead of app.getPath("userData") - Added file existence validation after capture with clear error message 2. Image analysis crash in Ollama mode (Prat011#41, Prat011#43) - Root cause: When USE_OLLAMA=true, this.model (Gemini) stays null. Methods like analyzeImageFile, generateSolution, etc. called this.model.generateContent() without checking for Ollama mode - Fix: Added Ollama fallback paths to all 6 affected methods using the Ollama /api/generate endpoint with images support for vision 3. Audio analysis crash in Ollama mode (Prat011#6) - Root cause: Same null model issue as above - Fix: Added explicit error message since Ollama doesn't support audio analysis natively (requires Gemini API key) 4. Build fails with "dist-electron/main.js does not exist" (Prat011#35) - Root cause: Build script runs "npm run clean" (deletes dist-electron/) then "tsc" (root tsconfig with noEmit:true) which does nothing - Fix: Added "tsc -p electron/tsconfig.json" to build script 5. Improved Ollama integration (Prat011#25) - Added robust JSON parsing (parseJsonSafe) to handle malformed JSON responses from local models - Extended callOllama() to support images parameter for vision models - Updated default fallback model to llava (vision-capable) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes 7 open issues by addressing 5 root causes across the screenshot pipeline, LLM integration, and build system.
Issues Resolved
screenshot-desktopstrips spaces from filenames — "Application Support" → "ApplicationSupport". Switched toapp.getPath("temp")USE_OLLAMA=true,this.model(Gemini) staysnull. All vision methods calledthis.model.generateContent()without an Ollama fallbackdist-electron/then never recompiled electron code. Addedtsc -p electron/tsconfig.jsonto buildimagesparam, robust JSON parsing, default tollavamodelChanges
electron/ScreenshotHelper.tsapp.getPath("temp")instead ofapp.getPath("userData")to avoid spaces in path{ recursive: true }tomkdirSyncfor safetyelectron/LLMHelper.tsthis.model.generateContent():extractProblemFromImages— uses Ollama vision API withimagesparamgenerateSolution— usescallOllama()debugSolutionWithImages— uses Ollama vision APIanalyzeImageFile— uses Ollama vision APIanalyzeAudioFile/analyzeAudioFromBase64— clear error (Ollama can't do audio)callOllama()to accept optionalimagesparameter for vision modelsparseJsonSafe()for robust JSON extraction from LLM responsesgemma:latesttollava:latest(vision-capable)package.json"build": "npm run clean && tsc -p electron/tsconfig.json && tsc && vite build"Test plan
tsc -p electron/tsconfig.json— compiles with 0 errorstsc --noEmit— frontend type-check passesvite build— builds successfully,dist-electron/main.jsexists/var/folders/.../T/)llavavision model correctly analyzes screenshots via Ollama APIscreencaptureverified working from CLI (4.9MB file) and Node.js (3.3MB file)Notes for reviewers
llava,llama3.2-vision) for image analysis.llama3.2(text-only) won't work.GEMINI_API_KEY.screenshot-desktoplibrary bug (space stripping in filenames) is upstream — using temp dir is the workaround.🤖 Generated with Claude Code