Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,091 changes: 1,091 additions & 0 deletions .debug/last-llm-filled-prompt.txt

Large diffs are not rendered by default.

86 changes: 86 additions & 0 deletions PR_OLLAMA_MEDIA_FIX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
## Summary
This PR fixes repeated runtime crashes in Ollama mode during media analysis and adds capability-aware model handling for image/audio workflows.

## Bug Details
In Ollama mode, media analysis paths were still using Gemini-only calls:
- analyzeAudioFromBase64 and analyzeAudioFile called Gemini generateContent
- analyzeImageFile and image-debug/image-extraction paths could also route to Gemini-only behavior

When Ollama was selected, the Gemini model instance was null by design, which caused:
- TypeError: Cannot read properties of null (reading 'generateContent')

## Root Cause Analysis
1. Provider mismatch in media paths
- Ollama mode sets useOllama=true and does not initialize Gemini model.
- Several media methods still dereferenced this.model.generateContent.

2. Missing capability awareness for Ollama models
- The app fetched model names but had no concept of modality support.
- Image and audio calls did not verify whether the selected Ollama model could process those modalities.

3. No guidance/fallbacks
- Failures surfaced as generic runtime exceptions rather than actionable remediation.

## Fix Implemented
### 1) Provider-safe execution paths
- Added Gemini guard helper to prevent null dereference for Gemini-only invocations.
- Updated generation paths to branch by active provider.

### 2) Ollama image support
- Added Ollama multimodal image handling via /api/chat with messages[].images.
- Image extraction/debug/analysis now work in Ollama mode when the selected model supports vision.

### 3) Ollama audio support path
- Added best-effort Ollama audio analysis via /api/chat.
- Tries compatible payload variants for broader Ollama/model compatibility.
- Returns actionable install guidance if audio is unsupported by current installation/model.

### 4) Capability detection and auto-selection
- Added capability inference from Ollama /api/tags model metadata (name + families/details) for:
- supportsVision
- supportsAudio
- Before media analysis, the helper now:
- validates current model capability
- auto-switches to an installed capability-matching model when available
- emits clear install guidance when no capable model exists

### 5) IPC and UI exposure
- Exposed capability metadata through Electron IPC/preload APIs.
- Model selector now shows capability badges (vision/audio), selected model capability summary, and install hints when missing.

## Files Changed
- electron/LLMHelper.ts
- electron/ipcHandlers.ts
- electron/preload.ts
- src/components/ui/ModelSelector.tsx
- src/App.tsx
- src/types/electron.d.ts

## Validation
- Electron typecheck: npx tsc -p electron/tsconfig.json
- Workspace typecheck: npx tsc --noEmit
- Result: no TypeScript errors.

## Behavioral Impact
- Eliminates null dereference crashes in Ollama mode for media-triggered flows.
- Enables image analysis in Ollama mode when a vision-capable model is installed.
- Adds best-effort audio path in Ollama mode, with explicit guidance when unsupported.

## Notes
- Capability detection is heuristic-based from Ollama model metadata and naming.
- Audio support depends on Ollama version and model-specific multimodal support.

## Example Install Guidance
- Vision-capable models:
- ollama pull llama3.2-vision:11b
- ollama pull llava:7b
- Audio-capable models (if available in your Ollama build):
- ollama pull qwen2-audio:7b

## Risk Assessment
Low-to-medium:
- Adds provider checks and fallback logic but keeps existing API surface largely unchanged.
- Main risk is false positives/negatives from capability inference heuristics, mitigated by clear error messaging and install hints.

## Follow-up (Optional)
- Replace heuristic capability inference with explicit capability probing against model metadata when Ollama exposes richer modality attributes.
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,18 +34,35 @@ npm install

3. Set up environment variables:
- Create a file named `.env` in the root folder


**Provider Selection (recommended):**
```env
# One of: gemini | ollama | nvidia
LLM_PROVIDER=gemini
```

**For Gemini (Cloud AI):**
```env
LLM_PROVIDER=gemini
GEMINI_API_KEY=your_api_key_here
```

**For Ollama (Local/Private AI):**
```env
LLM_PROVIDER=ollama
USE_OLLAMA=true
OLLAMA_MODEL=llama3.2
OLLAMA_URL=http://localhost:11434
```

**For NVIDIA Build Models (Multimodal):**
```env
LLM_PROVIDER=nvidia
USE_NVIDIA=true
NVIDIA_API_KEY=your_nvidia_api_key_here
NVIDIA_MODEL=mistralai/mistral-small-3.1-24b-instruct-2503
NVIDIA_URL=https://integrate.api.nvidia.com/v1/chat/completions
```

- Save the file

Expand Down
Binary file added Resume/AI_Engineer.pdf
Binary file not shown.
Loading