Fix: Add missing dependencies and Ollama model setup instructions #402

Averroes90 · 2025-07-21T00:29:18Z

Problem

Backend throws MissingDependencyException when processing .docx files due to missing markitdown[docx] dependency
Application fails with Ollama model not found error for nomic-embed-text:137m-v1.5-fp16

Solution

Added markitdown[docx] to requirements.txt
Added Ollama nomic-embed-text:137m-v1.5-fp16 model setup to setup.sh

Testing

Verified .docx file processing works after installing markitdown[docx]
Confirmed application runs successfully after pulling the required Ollama model

Summary by CodeRabbit

Chores
- Expanded backend dependencies to add new packages and an editable dependency pointing to a remote repository for development/integration.
- Enhanced setup script to automatically check for and pull two AI models during installation if they are not already present.

- Add markitdown[docx] to requirements.txt to resolve DocxConverter MissingDependencyException - Add Ollama model pull instructions to setup documentation - Fixes issue with .docx file processing in backend

coderabbitai · 2025-07-21T00:29:25Z

Walkthrough

Updated backend Python dependencies (added cobble==0.1.4, mammoth==1.9.1, markitdown[docx], and an editable GitHub dependency) and extended setup.sh to check for and pull two Ollama models (gemma3:4b and nomic-embed-text:137m-v1.5-fp16) if missing.

Changes

Cohort / File(s)	Change Summary
Backend dependencies `apps/backend/requirements.txt`	Added `cobble==0.1.4`, `mammoth==1.9.1`, updated `markitdown` to include `[docx]` extras, and added an editable GitHub dependency pointing to a specific commit/subdirectory.
Setup / Ollama model pulls `setup.sh`	Inserted presence checks and conditional pull blocks for two Ollama models: `gemma3:4b` and `nomic-embed-text:137m-v1.5-fp16`, with logging and error handling, placed after existing Ollama installation logic.

Sequence Diagram(s)

sequenceDiagram
    participant Setup as setup.sh
    participant Ollama as ollama CLI
    Note over Setup: After Ollama install step
    Setup->>Ollama: ollama list | grep -q "gemma3:4b"
    alt not present
        Setup->>Ollama: ollama pull gemma3:4b
        Ollama-->>Setup: success / failure
    else present
        Ollama-->>Setup: already present
    end
    Setup->>Ollama: ollama list | grep -q "nomic-embed-text:137m-v1.5-fp16"
    alt not present
        Setup->>Ollama: ollama pull nomic-embed-text:137m-v1.5-fp16
        Ollama-->>Setup: success / failure
    else present
        Ollama-->>Setup: already present
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Fix #409: Add missing dependencies for DOCX file processing #411 — Also updates markitdown to include DOCX extras to enable DOCX processing in the backend.

Poem

I hopped through the repo, sniffed new seeds,
Cobble and mammoth fulfilling new needs.
Markitdown learned docx, a clever new trick,
Two Ollama models pulled—perfect and quick.
A joyful rabbit, I dance and I cheer! 🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

apps/backend/requirements.txt (1)
28-28: mammoth is already pulled in by markitdown[docx]; consider dropping the direct pin.

markitdown[docx] declares mammoth as a transitive dependency. Keeping a separate
mammoth==1.9.1 line:

adds maintenance overhead,

risks version conflicts if markitdown upgrades its declared range, and

offers no benefit unless you deliberately need a stricter pin.

Unless there is a conscious reason to override the version, delete the redundant line:
-mammoth==1.9.1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64371f7 and 7bad0ab.

📒 Files selected for processing (1)

apps/backend/requirements.txt (3 hunks)

apps/backend/requirements.txt

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

setup.sh (2)
131-137: Factor out duplicated model-pull logic into a helper (ensure_ollama_model)

The new block correctly handles the extra Ollama model, but the code now contains two near-identical “check & pull” sequences (for gemma3:4b and nomic-embed-text:137m-v1.5-fp16). Consolidating them will:

• eliminate duplication
• make it trivial to add/remove models in one place
• keep the main script body readable

Example refactor:
+#–– Helper: ensure an Ollama model is present ––#
+ensure_ollama_model() {
+  local model="$1"
+  if ! ollama list | grep -q "^${model}"; then
+    info "Pulling ${model} model…"
+    ollama pull "${model}" || error "Failed to pull ${model}"
+    success "${model} model ready"
+  else
+    info "${model} model already present—skipping"
+  fi
+}
+
-# existing per-model blocks
-if ! ollama list | grep -q 'gemma3:4b'; then
-  ...
-fi
-
-if ! ollama list | grep -q 'nomic-embed-text:137m-v1.5-fp16'; then
-  ...
-fi
+# concise usage
+ensure_ollama_model "gemma3:4b"
+ensure_ollama_model "nomic-embed-text:137m-v1.5-fp16"
Side note: anchoring the grep with ^ avoids accidental matches where the tag appears as part of another string.

38-45: Update usage/help text to mention the second model

The help block still states “pull gemma3:4b model” (singular). Now that the script always ensures both gemma3:4b and nomic-embed-text:137m-v1.5-fp16, tweak the wording to avoid surprising users.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8e86b3 and 6bc2423.

📒 Files selected for processing (1)

setup.sh (1 hunks)

MohaimenH · 2025-07-24T00:34:46Z

Stumbled upon this issue as well, was about to create a PR but great to see it here already!

Might want to update the README/Setup as well to include the dependency on nomic-embed-text:137m-v1.5-fp16.

Thanks!

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6bc2423 and 98016a3.

📒 Files selected for processing (2)

apps/backend/requirements.txt (3 hunks)
setup.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

apps/backend/requirements.txt

coderabbitai · 2025-08-12T15:30:50Z

setup.sh

+
+if ! ollama list | grep -q 'gemma3:4b'; then
+  info "Pulling gemma3:4b model…"
+  ollama pull gemma3:4b || error "Failed to pull gemma3:4b"
+  success "gemma3:4b model ready"
+else
+  info "gemma3:4b model already present—skipping"
+fi
+
+if ! ollama list | grep -q 'nomic-embed-text:137m-v1.5-fp16'; then
+  info "Pulling nomic-embed-text:137m-v1.5-fp16 model…"
+  ollama pull nomic-embed-text:137m-v1.5-fp16 || error "Failed to pull nomic-embed-text:137m-v1.5-fp16"
+  success "nomic-embed-text:137m-v1.5-fp16 model ready"
+else
+  info "nomic-embed-text:137m-v1.5-fp16 model already present—skipping"
+fi
+


🛠️ Refactor suggestion

DRY the model pull logic and avoid hard-coded unconditional pulls

You already have ollama_check_or_pull (Lines 110-119). These two blocks duplicate its behavior and hard-code models irrespective of provider/config. Prefer reusing the helper to keep behavior consistent and reduce maintenance. Also consider not pulling models unless the provider is Ollama or an explicit opt-in flag is set, since you already pull models based on backend .env later (Lines 186-196).

Proposed change: replace both blocks (Lines 138-154) with a compact loop that uses the helper.

- if ! ollama list | grep -q 'gemma3:4b'; then - info "Pulling gemma3:4b model…" - ollama pull gemma3:4b || error "Failed to pull gemma3:4b" - success "gemma3:4b model ready" - else - info "gemma3:4b model already present—skipping" - fi - - if ! ollama list | grep -q 'nomic-embed-text:137m-v1.5-fp16'; then - info "Pulling nomic-embed-text:137m-v1.5-fp16 model…" - ollama pull nomic-embed-text:137m-v1.5-fp16 || error "Failed to pull nomic-embed-text:137m-v1.5-fp16" - success "nomic-embed-text:137m-v1.5-fp16 model ready" - else - info "nomic-embed-text:137m-v1.5-fp16 model already present—skipping" - fi +# Option A (simple, DRY): always ensure these two models exist +for model in "gemma3:4b" "nomic-embed-text:137m-v1.5-fp16"; do + ollama_check_or_pull "$model" +done + +# Option B (conditional): only ensure when explicitly requested +# if [[ "${PULL_DEFAULT_OLLAMA_MODELS:-true}" == "true" ]]; then +# for model in "gemma3:4b" "nomic-embed-text:137m-v1.5-fp16"; do +# ollama_check_or_pull "$model" +# done +# fi

Additionally, to make the presence check more robust (avoid partial matches), improve the helper to match the first column exactly. This change is outside the modified hunk; sharing suggested update for the helper:

# Suggested improvement outside the changed lines (Lines 110-119) ollama_check_or_pull() { local model_name="$1" if ! ollama list | awk '{print $1}' | grep -qx "$model_name"; then info "Pulling $model_name model…" ollama pull "$model_name" || error "Failed to pull $model_name model" success "$model_name model ready" else info "$model_name model already present—skipping" fi }

🤖 Prompt for AI Agents

In setup.sh around lines 138 to 154, the two repeated ollama pull blocks duplicate existing helper behavior and unconditionally attempt pulls; replace those blocks with a compact loop that calls ollama_check_or_pull for each model (e.g., gemma3:4b and nomic-embed-text:137m-v1.5-fp16) and ensure the loop only runs when Ollama is the configured provider or an explicit opt-in flag is set; also update the ollama_check_or_pull helper (lines ~110-119) to check the first column exactly (use ollama list piped to awk '{print $1}' | grep -qx "$model_name") so presence checks avoid partial matches.

💡 Verification agent

🧩 Analysis chain

Confirm necessity of hard-coding gemma3:4b and document the model requirement

The PR objective was to fix the embedding model error (nomic-embed-text:137m-v1.5-fp16). Adding gemma3:4b here is potentially out of scope unless it’s the default LLM in your .env. If not required, consider removing it to save setup time/bandwidth and rely on the backend’s env-driven pull (Lines 186-196). Also, per the PR comment suggestion, add a note in README/SETUP docs instructing users to pull the embedding model during setup.

I can open a follow-up PR to:

Update README/SETUP to mention nomic-embed-text:137m-v1.5-fp16 and markitdown[docx].

Gate the default model pulls behind a flag (e.g., PULL_DEFAULT_OLLAMA_MODELS) or provider checks.

Run this script to verify whether gemma3:4b is actually referenced by defaults:

🏁 Script executed:

#!/bin/bash # Verify whether gemma3:4b and nomic-embed-text:137m-v1.5-fp16 are referenced in env/configs set -euo pipefail echo "Searching for LLM/Embedding provider & model defaults…" rg -n -A2 -B2 '(LLM_PROVIDER|LL_MODEL|EMBEDDING_PROVIDER|EMBEDDING_MODEL|gemma3:4b|nomic-embed-text:137m-v1.5-fp16)' echo echo "Enumerating candidate env files…" fd -H '(.env|.env.*|env.sample|env.example)$' -t f | sed 's/^/- /'

Length of output: 12324

Consolidate and conditionally gate static Ollama model pulls

Gemma3:4b is indeed the default LLM_MODEL (apps/backend/app/core/config.py L22), so pulling it by default is valid. However, nomic-embed-text:137m-v1.5-fp16 isn’t referenced in any defaults or sample .env (apps/backend/app/core/config.py L26 uses dengcao/Qwen3-Embedding-0.6B:Q8_0), so its unconditional pull in setup.sh (Lines 137–154) may waste time and bandwidth when the embedding model differs.

Please consolidate and gate these pulls behind your existing env-driven logic:

• In setup.sh (Lines 137–154): remove or replace the two hard-coded “ollama pull” blocks and rely on
ollama_check_or_pull $LL_MODEL / $EMBEDDING_MODEL in the conditional section at Lines 186–196.
• Optionally introduce a flag (e.g. PULL_DEFAULT_OLLAMA_MODELS) or wrap the pull logic in provider checks so you only pull models actually in use.
• Update SETUP.md (and setup.ps1, SETUP.es.md) to document:
– Default LL_MODEL (“gemma3:4b”) and EMBEDDING_MODEL (“dengcao/Qwen3-Embedding-0.6B:Q8_0”)
– How to customize EMBEDDING_MODEL in .env and trigger its pull
– Any new PULL_DEFAULT_OLLAMA_MODELS flag and its default behavior.

These changes will avoid unnecessary downloads and keep setup.sh aligned with your .env-driven model configuration.

🤖 Prompt for AI Agents

In setup.sh around lines 138 to 154, the script unconditionally pulls gemma3:4b and nomic-embed-text:137m-v1.5-fp16 which duplicates or conflicts with the env-driven model logic later; remove those two hard-coded ollama pull blocks and instead invoke the existing ollama_check_or_pull $LL_MODEL and ollama_check_or_pull $EMBEDDING_MODEL in the conditional section around lines 186–196 so models are only pulled based on .env values; optionally add a PULL_DEFAULT_OLLAMA_MODELS flag (default true) or wrap pulls in provider checks to avoid unnecessary downloads; finally update SETUP.md, setup.ps1 and SETUP.es.md to document the default LL_MODEL (gemma3:4b), default EMBEDDING_MODEL (dengcao/Qwen3-Embedding-0.6B:Q8_0), how to customize EMBEDDING_MODEL in .env and how to control the new PULL_DEFAULT_OLLAMA_MODELS flag.

Fix: Add markitdown[docx] dependency and Ollama model setup

7bad0ab

- Add markitdown[docx] to requirements.txt to resolve DocxConverter MissingDependencyException - Add Ollama model pull instructions to setup documentation - Fixes issue with .docx file processing in backend

coderabbitai bot reviewed Jul 21, 2025

View reviewed changes

apps/backend/requirements.txt Show resolved Hide resolved

apps/backend/requirements.txt Outdated Show resolved Hide resolved

apps/backend/requirements.txt Show resolved Hide resolved

Averroes90 added 2 commits July 20, 2025 19:36

Fix: Consolidate markitdown entries in requirements.txt

b8e86b3

Fix: pushing the setup.sh file with the fixed ollama model setup

6bc2423

coderabbitai bot reviewed Jul 21, 2025

View reviewed changes

Merge branch 'main' into fix-dependencies-and-model

9729fbb

srbhr moved this to Backlog in Resume Matcher Kanban Jul 25, 2025

srbhr added this to Resume Matcher Kanban Jul 25, 2025

srbhr removed this from Resume Matcher Kanban Jul 25, 2025

Merge branch 'main' into fix-dependencies-and-model

98016a3

coderabbitai bot reviewed Aug 12, 2025

View reviewed changes

Merge branch 'main' into fix-dependencies-and-model

076cf54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix: Add missing dependencies and Ollama model setup instructions #402

Fix: Add missing dependencies and Ollama model setup instructions #402

Uh oh!

Averroes90 commented Jul 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 21, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

MohaimenH commented Jul 24, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

Fix: Add missing dependencies and Ollama model setup instructions #402

Are you sure you want to change the base?

Fix: Add missing dependencies and Ollama model setup instructions #402

Uh oh!

Conversation

Averroes90 commented Jul 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

MohaimenH commented Jul 24, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Averroes90 commented Jul 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 21, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)