Skip to content

feat(embed): add Gemini Embedding 2 multimodal provider#365

Open
mbelinky wants to merge 7 commits intotobi:mainfrom
mbelinky:feat/gemini-embedding-2
Open

feat(embed): add Gemini Embedding 2 multimodal provider#365
mbelinky wants to merge 7 commits intotobi:mainfrom
mbelinky:feat/gemini-embedding-2

Conversation

@mbelinky
Copy link

@mbelinky mbelinky commented Mar 11, 2026

Summary

Adds Gemini Embedding 2 (gemini-embedding-2-preview) as a cloud embedding provider for QMD, including multimodal embedding support for text, images, and small PDFs.

What's new

  • New provider: google — use qmd embed --provider google
  • Multimodal embedding support for:
    • text documents
    • images (png, jpg, jpeg)
    • PDFs (small files; currently conservatively limited to <= 6 pages)
  • Configurable output dimensions for Gemini embeddings:
    • 3072 (default)
    • 1536
    • 768
  • Batch API support with retries and Retry-After handling
  • Task type hints:
    • RETRIEVAL_DOCUMENT for indexing
    • RETRIEVAL_QUERY for search/query embedding
  • Status/CLI wiring for provider selection and dimension display

Important behavior notes

  • Switching providers requires re-embedding existing vectors. Local embeddings (embeddinggemma, 768d) are not compatible with Gemini's default 3072d vectors.
  • Use:
qmd embed --force --provider google
  • Default collection behavior remains markdown-only (**/*.md). This PR adds multimodal embedding support, but does not silently change all existing collections to index binaries by default.
  • Multimodal document embeddings now include text context + file content, so stored titles / path-derived context are preserved instead of sending file-only inputs.

Main implementation areas

  • src/google-embed.ts
    • Gemini Embedding 2 client
    • multimodal input normalization
    • batching + retry logic
  • src/llm.ts
    • provider resolution / provider-specific embed behavior
  • src/store.ts
    • multimodal content detection
    • embedding pipeline integration
    • document-title/context preservation for multimodal inputs
    • PDF page estimation guard
  • src/cli/qmd.ts
    • --provider
    • --dimensions
    • provider-aware status output

Bug fixes included

  1. Schema migration ordering
    • fixes content_type index creation order so fresh/existing DB migration paths don't crash
  2. Dimension/provider mismatch handling
    • avoids using Google dimensions unless the provider is explicitly configured, preventing 3072d query vectors from being mixed against local 768d indexes
  3. Accidental PR contamination removed
    • removes the stray reference patch file that was accidentally included earlier in the branch

Testing performed

Verified on this branch with focused tests covering the changed behavior:

  • test/google-embed.test.ts
  • test/generate-embeddings.multimodal.test.ts
  • targeted SDK default-pattern regression coverage

Specifically verified:

  • Gemini embedder returns embeddings for:
    • text
    • image + text
    • PDF + text
  • multimodal embedding inputs include both:
    • file content
    • useful text context/title metadata
  • default collection pattern remains **/*.md

Known limitations / follow-ups

  • PDF page counting is still heuristic-based and conservative.
  • On some Linux hosts, local llama/node-llama setup still emits Vulkan fallback noise even when the Google embedding path works correctly. That's environmental/runtime noise rather than a blocker for this embedding provider path.
  • Larger PDF chunking could be improved in a follow-up.

Razor added 7 commits March 10, 2026 23:04
- Add google-embed.ts with gemini-embedding-2-preview API client
- Support text, image, PDF, and interleaved multimodal embeddings
- Add GoogleHybridLLM: Gemini API for embeddings, local llama.cpp for reranking
- Extend collections to support image/PDF file patterns
- Add --provider and --dimensions CLI flags
- Add cross-modal search support in store
- Add google-embed tests

WIP: needs review and integration testing
…ider setting

Reverts the resolvedEmbedProvider fallback that caused dimension mismatch
when GEMINI_API_KEY was set in the environment but tests used local 768d vectors.
CREATE INDEX on content_type was running before ALTER TABLE added the column,
crashing on existing databases without the column.
@mbelinky
Copy link
Author

Quick cleanup pass landed on this branch.

What changed since the initial draft:

  • removed the accidentally included reference patch file
  • restored the default collection pattern to **/*.md (no silent binary indexing by default)
  • improved multimodal embeddings so image/PDF inputs include both file content and useful text context
  • preserved stored document titles in multimodal embedding context
  • added focused regression coverage for multimodal input construction and default-pattern behavior

Focused verification run on the updated branch covered:

  • test/google-embed.test.ts
  • test/generate-embeddings.multimodal.test.ts
  • targeted SDK default-pattern regression coverage

There is still some node-llama/Vulkan fallback noise on this Linux host during local runtime setup, but the Google embedding path itself is working and the PR body has been updated to reflect the current branch accurately.

@davidhop11
Copy link

Production-readiness review complete

Rebased onto main (ae3604c — v2.0.1 release, launcher fix, Qwen3 filename fix) and applied the following improvements.

Changes made

  • Rebase: Rebased 7 commits (WIP + 6 polish commits) onto current main with no conflicts
  • Provider-switching warning: When qmd embed is run and existing vectors were embedded with a different provider, a clear warning is printed: existing vectors were embedded with 'X' but the active provider is 'Y' — run 'qmd embed --force' to re-embed
  • Improved help text: --provider now explains auto-detection logic and the QMD_EMBED_PROVIDER env var; --dimensions explains Matryoshka truncation, valid values, and that dimensions must match between embed and search
  • Test schema fix: mcp.test.ts had a hand-rolled initTestDatabase that was missing the new content_type column on documents and provider column on content_vectors — added both so all 56 MCP tests pass

Test results

test/google-embed.test.ts              6/6 passed
test/generate-embeddings.multimodal.test.ts  2/2 passed
test/mcp.test.ts                      56/56 passed
test/store.test.ts                    198/198 passed
test/llm.test.ts                      40/40 passed

The only failing test file (test/cli.test.ts) fails because tsx is not installed in the worktree environment — this is a pre-existing infrastructure issue unrelated to the Gemini changes.

Branch

Pushed to: davidhop11/qmdfeat/gemini-embedding-2


How to use Gemini Embedding 2

  1. Set your API key:

    export GEMINI_API_KEY=your_key_here
  2. Re-embed your index with the Google provider (required — dimensions differ from local):

    qmd embed --provider google --force
  3. Search normally — the provider is auto-detected from GEMINI_API_KEY when no GPU is available:

    qmd search "your query"

Dimensions

  • Default: 3072 (highest quality)
  • Available: 768, 1536, 3072 (Matryoshka truncation)
  • Must match between embed and search — use qmd embed --force --dimensions 768 to switch

Switching back to local

qmd embed --provider local --force

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants