Skip to content

Fix Ollama media handling, crash RCA, and capability-aware model selection#65

Open
manideepsp wants to merge 2 commits intoPrat011:masterfrom
manideepsp:fix/ollama-media-capabilities
Open

Fix Ollama media handling, crash RCA, and capability-aware model selection#65
manideepsp wants to merge 2 commits intoPrat011:masterfrom
manideepsp:fix/ollama-media-capabilities

Conversation

@manideepsp
Copy link
Copy Markdown

Summary

This PR fixes repeated runtime crashes in Ollama mode during media analysis and adds capability-aware model handling for image/audio workflows.

Bug Details

In Ollama mode, media analysis paths were still using Gemini-only calls:

  • analyzeAudioFromBase64 and analyzeAudioFile called Gemini generateContent
  • analyzeImageFile and image-debug/image-extraction paths could also route to Gemini-only behavior

When Ollama was selected, the Gemini model instance was null by design, which caused:

  • TypeError: Cannot read properties of null (reading 'generateContent')

Root Cause Analysis

  1. Provider mismatch in media paths
  • Ollama mode sets useOllama=true and does not initialize Gemini model.
  • Several media methods still dereferenced this.model.generateContent.
  1. Missing capability awareness for Ollama models
  • The app fetched model names but had no concept of modality support.
  • Image and audio calls did not verify whether the selected Ollama model could process those modalities.
  1. No guidance/fallbacks
  • Failures surfaced as generic runtime exceptions rather than actionable remediation.

Fix Implemented

1) Provider-safe execution paths

  • Added Gemini guard helper to prevent null dereference for Gemini-only invocations.
  • Updated generation paths to branch by active provider.

2) Ollama image support

  • Added Ollama multimodal image handling via /api/chat with messages[].images.
  • Image extraction/debug/analysis now work in Ollama mode when the selected model supports vision.

3) Ollama audio support path

  • Added best-effort Ollama audio analysis via /api/chat.
  • Tries compatible payload variants for broader Ollama/model compatibility.
  • Returns actionable install guidance if audio is unsupported by current installation/model.

4) Capability detection and auto-selection

  • Added capability inference from Ollama /api/tags model metadata (name + families/details) for:
    • supportsVision
    • supportsAudio
  • Before media analysis, the helper now:
    • validates current model capability
    • auto-switches to an installed capability-matching model when available
    • emits clear install guidance when no capable model exists

5) IPC and UI exposure

  • Exposed capability metadata through Electron IPC/preload APIs.
  • Model selector now shows capability badges (vision/audio), selected model capability summary, and install hints when missing.

Files Changed

  • electron/LLMHelper.ts
  • electron/ipcHandlers.ts
  • electron/preload.ts
  • src/components/ui/ModelSelector.tsx
  • src/App.tsx
  • src/types/electron.d.ts

Validation

  • Electron typecheck: npx tsc -p electron/tsconfig.json
  • Workspace typecheck: npx tsc --noEmit
  • Result: no TypeScript errors.

Behavioral Impact

  • Eliminates null dereference crashes in Ollama mode for media-triggered flows.
  • Enables image analysis in Ollama mode when a vision-capable model is installed.
  • Adds best-effort audio path in Ollama mode, with explicit guidance when unsupported.

Notes

  • Capability detection is heuristic-based from Ollama model metadata and naming.
  • Audio support depends on Ollama version and model-specific multimodal support.

Example Install Guidance

  • Vision-capable models:
    • ollama pull llama3.2-vision:11b
    • ollama pull llava:7b
  • Audio-capable models (if available in your Ollama build):
    • ollama pull qwen2-audio:7b

Risk Assessment

Low-to-medium:

  • Adds provider checks and fallback logic but keeps existing API surface largely unchanged.
  • Main risk is false positives/negatives from capability inference heuristics, mitigated by clear error messaging and install hints.

Follow-up (Optional)

  • Replace heuristic capability inference with explicit capability probing against model metadata when Ollama exposes richer modality attributes.

Copilot AI review requested due to automatic review settings March 28, 2026 08:03
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Ollama-mode media-analysis crashes caused by Gemini-only calls, and introduces capability-aware Ollama model discovery/selection for vision/audio workflows across the Electron main process and the renderer model selector UI.

Changes:

  • Added Gemini model guard + provider branching so image/audio flows don’t dereference a null Gemini model in Ollama mode.
  • Implemented Ollama /api/chat paths for image (and best-effort audio) analysis, plus heuristic capability inference from /api/tags.
  • Exposed Ollama model capability metadata via IPC/preload and updated the UI to show capability badges and install guidance.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
electron/LLMHelper.ts Adds provider-safe execution, Ollama image/audio chat calls, capability inference, and auto-selection logic.
electron/ipcHandlers.ts Exposes a new IPC handler to fetch Ollama model capabilities.
electron/preload.ts Adds getOllamaModelCapabilities to the renderer-facing Electron API.
src/components/ui/ModelSelector.tsx Fetches capabilities, shows capability tags/badges, and displays install hints in the model dropdown UI.
src/App.tsx Extends the renderer global window.electronAPI typing with the new capabilities method.
src/types/electron.d.ts Updates shared renderer typings for the expanded Electron API surface.
Comments suppressed due to low confidence (1)

src/components/ui/ModelSelector.tsx:52

  • When the current config is already Ollama, loadCurrentConfig() calls loadOllamaModels(), and the useEffect([selectedProvider]) will also call loadOllamaModels() after setSelectedProvider('ollama'), causing duplicate fetches and potential state races. Consider choosing one mechanism (either the effect or the explicit call) and removing the other, or add a guard to avoid the second call when models are already loaded.
  useEffect(() => {
    if (selectedProvider === 'ollama') {
      loadOllamaModels();
    }
  }, [selectedProvider]);

  const loadCurrentConfig = async () => {
    try {
      setIsLoading(true);
      const config = await window.electronAPI.getCurrentLlmConfig();
      setCurrentConfig(config);
      setSelectedProvider(config.provider);
      
      if (config.isOllama) {
        setSelectedOllamaModel(config.model);
        await loadOllamaModels();
      }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 60 to 66
const loadOllamaModels = async () => {
try {
const models = await window.electronAPI.getAvailableOllamaModels();
const capabilities = await window.electronAPI.getOllamaModelCapabilities();
const models = capabilities.map((capability) => capability.name);

setOllamaModelCapabilities(capabilities);
setAvailableOllamaModels(models);
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadOllamaModels always queries model capabilities from whatever ollamaUrl is currently stored in the main process (defaulting to http://localhost:11434). Since the user can edit ollamaUrl in this component before clicking “Apply Changes”, the model list/capability badges can be fetched from the wrong host and appear inconsistent with the entered URL. Consider allowing getOllamaModelCapabilities (and/or getAvailableOllamaModels) to accept a URL argument, or add an IPC method to set the Ollama URL used for discovery prior to switching providers.

Copilot uses AI. Check for mistakes.
Comment on lines +145 to +151
private async ensureOllamaCapability(modality: "vision" | "audio"): Promise<void> {
if (!this.useOllama) return

const capabilities = await this.getOllamaModelCapabilities()
if (capabilities.length === 0) {
throw new Error(`No Ollama models detected. ${this.getOllamaInstallGuidance(modality)}`)
}
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureOllamaCapability() calls getOllamaModelCapabilities(), which fetches /api/tags on every image/audio request. For media-heavy flows this adds repeated network round-trips and can become a noticeable latency bottleneck. Consider caching the capabilities for a short TTL (or until the model list changes) and reusing them within a session, invalidating on switchToOllama() / refresh.

Copilot uses AI. Check for mistakes.
Comment on lines +240 to +243
} catch (error) {
console.error("[LLMHelper] Error calling Ollama chat with images:", error)
throw new Error(`Failed Ollama image analysis: ${error.message}. Ensure selected Ollama model supports vision.`)
}
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the catch block, the code assumes error has a .message property (${error.message}), but non-Error throwables (or some fetch failures) can be strings/objects, which can cause a secondary crash while building the error message. Prefer narrowing (error instanceof Error) and falling back to String(error) (or centralize via a getErrorMessage() helper) before interpolating.

Copilot uses AI. Check for mistakes.
Comment on lines +304 to +305
} catch (error) {
errors.push(error.message)
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch block pushes error.message into the errors list without narrowing the caught value. If a non-Error is thrown, this will throw again and hide the original failure. Use error instanceof Error ? error.message : String(error) here (and in other error-message interpolations) to avoid secondary exceptions.

Suggested change
} catch (error) {
errors.push(error.message)
} catch (error: unknown) {
const message = error instanceof Error ? error.message : String(error)
errors.push(message)

Copilot uses AI. Check for mistakes.
- Introduced `bootstrap_vosk_model.py` for downloading and extracting Vosk models.
- Added `stt_stream.py` to handle real-time speech-to-text streaming using Vosk.
- Implemented audio processing utilities in `audio.ts` for handling audio data conversion and preparation.
- Enhanced audio input handling with support for microphone and system audio capture.
- Added error handling and status reporting for audio stream initialization and processing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants