feat(embed): add Gemini Embedding 2 multimodal provider by mbelinky · Pull Request #365 · tobi/qmd

mbelinky · 2026-03-11T03:41:54Z

Summary

Adds Gemini Embedding 2 (gemini-embedding-2-preview) as a cloud embedding provider for QMD, including multimodal embedding support for text, images, and small PDFs.

What's new

New provider: google — use qmd embed --provider google
Multimodal embedding support for:
- text documents
- images (png, jpg, jpeg)
- PDFs (small files; currently conservatively limited to <= 6 pages)
Configurable output dimensions for Gemini embeddings:
- 3072 (default)
- 1536
- 768
Batch API support with retries and Retry-After handling
Task type hints:
- RETRIEVAL_DOCUMENT for indexing
- RETRIEVAL_QUERY for search/query embedding
Status/CLI wiring for provider selection and dimension display

Important behavior notes

Switching providers requires re-embedding existing vectors. Local embeddings (embeddinggemma, 768d) are not compatible with Gemini's default 3072d vectors.
Use:

qmd embed --force --provider google

Default collection behavior remains markdown-only (**/*.md). This PR adds multimodal embedding support, but does not silently change all existing collections to index binaries by default.
Multimodal document embeddings now include text context + file content, so stored titles / path-derived context are preserved instead of sending file-only inputs.

Main implementation areas

src/google-embed.ts
- Gemini Embedding 2 client
- multimodal input normalization
- batching + retry logic
src/llm.ts
- provider resolution / provider-specific embed behavior
src/store.ts
- multimodal content detection
- embedding pipeline integration
- document-title/context preservation for multimodal inputs
- PDF page estimation guard
src/cli/qmd.ts
- --provider
- --dimensions
- provider-aware status output

Bug fixes included

Schema migration ordering
- fixes content_type index creation order so fresh/existing DB migration paths don't crash
Dimension/provider mismatch handling
- avoids using Google dimensions unless the provider is explicitly configured, preventing 3072d query vectors from being mixed against local 768d indexes
Accidental PR contamination removed
- removes the stray reference patch file that was accidentally included earlier in the branch

Testing performed

Verified on this branch with focused tests covering the changed behavior:

test/google-embed.test.ts
test/generate-embeddings.multimodal.test.ts
targeted SDK default-pattern regression coverage

Specifically verified:

Gemini embedder returns embeddings for:
- text
- image + text
- PDF + text
multimodal embedding inputs include both:
- file content
- useful text context/title metadata
default collection pattern remains **/*.md

Known limitations / follow-ups

PDF page counting is still heuristic-based and conservative.
On some Linux hosts, local llama/node-llama setup still emits Vulkan fallback noise even when the Google embedding path works correctly. That's environmental/runtime noise rather than a blocker for this embedding provider path.
Larger PDF chunking could be improved in a follow-up.

- Add google-embed.ts with gemini-embedding-2-preview API client - Support text, image, PDF, and interleaved multimodal embeddings - Add GoogleHybridLLM: Gemini API for embeddings, local llama.cpp for reranking - Extend collections to support image/PDF file patterns - Add --provider and --dimensions CLI flags - Add cross-modal search support in store - Add google-embed tests WIP: needs review and integration testing

…ider setting Reverts the resolvedEmbedProvider fallback that caused dimension mismatch when GEMINI_API_KEY was set in the environment but tests used local 768d vectors.

CREATE INDEX on content_type was running before ALTER TABLE added the column, crashing on existing databases without the column.

mbelinky · 2026-03-11T08:23:15Z

Quick cleanup pass landed on this branch.

What changed since the initial draft:

removed the accidentally included reference patch file
restored the default collection pattern to **/*.md (no silent binary indexing by default)
improved multimodal embeddings so image/PDF inputs include both file content and useful text context
preserved stored document titles in multimodal embedding context
added focused regression coverage for multimodal input construction and default-pattern behavior

Focused verification run on the updated branch covered:

test/google-embed.test.ts
test/generate-embeddings.multimodal.test.ts
targeted SDK default-pattern regression coverage

There is still some node-llama/Vulkan fallback noise on this Linux host during local runtime setup, but the Google embedding path itself is working and the PR body has been updated to reflect the current branch accurately.

davidhop11 · 2026-03-12T07:50:03Z

Production-readiness review complete

Rebased onto main (ae3604c — v2.0.1 release, launcher fix, Qwen3 filename fix) and applied the following improvements.

Changes made

Rebase: Rebased 7 commits (WIP + 6 polish commits) onto current main with no conflicts
Provider-switching warning: When qmd embed is run and existing vectors were embedded with a different provider, a clear warning is printed: existing vectors were embedded with 'X' but the active provider is 'Y' — run 'qmd embed --force' to re-embed
Improved help text: --provider now explains auto-detection logic and the QMD_EMBED_PROVIDER env var; --dimensions explains Matryoshka truncation, valid values, and that dimensions must match between embed and search
Test schema fix: mcp.test.ts had a hand-rolled initTestDatabase that was missing the new content_type column on documents and provider column on content_vectors — added both so all 56 MCP tests pass

Test results

test/google-embed.test.ts              6/6 passed
test/generate-embeddings.multimodal.test.ts  2/2 passed
test/mcp.test.ts                      56/56 passed
test/store.test.ts                    198/198 passed
test/llm.test.ts                      40/40 passed

The only failing test file (test/cli.test.ts) fails because tsx is not installed in the worktree environment — this is a pre-existing infrastructure issue unrelated to the Gemini changes.

Branch

Pushed to: davidhop11/qmd → feat/gemini-embedding-2

How to use Gemini Embedding 2

Set your API key:
```
export GEMINI_API_KEY=your_key_here
```
Re-embed your index with the Google provider (required — dimensions differ from local):
```
qmd embed --provider google --force
```
Search normally — the provider is auto-detected from GEMINI_API_KEY when no GPU is available:
```
qmd search "your query"
```

Dimensions

Default: 3072 (highest quality)
Available: 768, 1536, 3072 (Matryoshka truncation)
Must match between embed and search — use qmd embed --force --dimensions 768 to switch

Switching back to local

qmd embed --provider local --force

Razor added 7 commits March 10, 2026 23:04

WIP: gemini embedding 2 provider (partial)

ad363f1

fix: polish Gemini Embedding 2 provider and tests

157b59a

fix: restore getConfiguredEmbedDimensions to only check explicit prov…

9e5c5c6

…ider setting Reverts the resolvedEmbedProvider fallback that caused dimension mismatch when GEMINI_API_KEY was set in the environment but tests used local 768d vectors.

fix: move content_type index creation after migration

67674e0

CREATE INDEX on content_type was running before ALTER TABLE added the column, crashing on existing databases without the column.

fix(embed): tighten multimodal indexing and restore md defaults

d6618c3

fix(embed): preserve document titles in multimodal context

c3bb69b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(embed): add Gemini Embedding 2 multimodal provider#365

feat(embed): add Gemini Embedding 2 multimodal provider#365
mbelinky wants to merge 7 commits intotobi:mainfrom
mbelinky:feat/gemini-embedding-2

mbelinky commented Mar 11, 2026 •

edited

Loading

Uh oh!

mbelinky commented Mar 11, 2026

Uh oh!

davidhop11 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mbelinky commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's new

Important behavior notes

Main implementation areas

Bug fixes included

Testing performed

Known limitations / follow-ups

Uh oh!

mbelinky commented Mar 11, 2026

Uh oh!

davidhop11 commented Mar 12, 2026

Production-readiness review complete

Changes made

Test results

Branch

How to use Gemini Embedding 2

Dimensions

Switching back to local

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mbelinky commented Mar 11, 2026 •

edited

Loading