Skip to content

Conversation

@StoreksFeed
Copy link

@StoreksFeed StoreksFeed commented Dec 30, 2025

Description

Modern embedding models (e.g., Gemini and FRIDA) support asymmetric embeddings through task-specific prefixes, which significantly improve retrieval accuracy by generating different embeddings for queries versus documents. This PR implements this capability in LightRAG.

Related Issues

N/A

Changes Made

Configuration & Documentation

  • Added EMBEDDING_DOCUMENT_PREFIX and EMBEDDING_QUERY_PREFIX environment variables to lightrag/api/config.py
  • Updated docs/DockerDeployment.md and env.example with new configuration options
  • Added example demonstrating prefix usage: examples/unofficial-sample/lightrag_embedding_prefixes.py

Core Infrastructure

  • Enhanced EmbeddingFunc wrapper in lightrag/utils.py with supports_context parameter
  • Updated wrap_embedding_func_with_attrs decorator to support context-aware functions
  • Modified lightrag/operate.py to pass context

Vector Storage Backends

Updated all storage implementations to use context parameter:

  • lightrag/kg/faiss_impl.py
  • lightrag/kg/milvus_impl.py
  • lightrag/kg/mongo_impl.py
  • lightrag/kg/nano_vector_db_impl.py
  • lightrag/kg/postgres_impl.py
  • lightrag/kg/qdrant_impl.py

LLM Provider Bindings

Updated embedding functions with context support:

  • lightrag/llm/openai.py - Prefix support
  • lightrag/llm/ollama.py - Prefix support
  • lightrag/llm/gemini.py - Automatic task_type selection
  • lightrag/llm/jina.py - Automatic task selection
  • lightrag/llm/hf.py - Prefix support

Binding Options

  • Updated GeminiEmbeddingOptions to support automatic task_type selection

API Server

  • Integrated prefix configuration into lightrag/api/lightrag_server.py
  • Updated lightrag/api/utils_api.py splash screen to display prefix settings

Checklist

  • Changes tested locally
  • Code reviewed
  • Documentation updated (if necessary)
  • Unit tests added (if applicable)

Additional Notes

Backward Compatibility

  • Fully backward compatible - task is not injected unless explicitly asked
  • Existing deployments without prefix configuration should continue to work unchanged
  • Optional feature activated only when EMBEDDING_DOCUMENT_PREFIX or EMBEDDING_QUERY_PREFIX environment variables are set

@StoreksFeed StoreksFeed changed the title feat: Add context-aware embedding support with prefixes feat: Add task-aware embedding support Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant