Skip to content

Conversation

@laszukdawid
Copy link

@laszukdawid laszukdawid commented Dec 5, 2025

Description

Adds explicit support for VoyageAI embedding. These embedding functions were already in the code, under anthropic.py, but there wasn't explicit support for using them via EMBEDDING_MODEL.

Related Issues

None that I've seen.

Changes Made

  • Externalize voyageai logic into voyageai.py from anthropic.py
  • Add validation for voyageai as args.embedding_model value

Checklist

  • Changes tested locally
  • Code reviewed
  • Documentation updated (if necessary)
  • Unit tests added (if applicable)

Additional Notes

N/A


Note

Adds first-class VoyageAI embeddings with a new provider module and integrates it into server binding/dispatch; removes VoyageAI code from Anthropic.

  • Embeddings:
    • New provider: Introduces lightrag/llm/voyageai.py with voyageai_embed (retry, AsyncClient) wrapped via wrap_embedding_func_with_attrs (embedding_dim=1024, max_token_size=16000) and helper get_available_embedding_models.
    • Server integration: Adds "voyageai" to supported embedding_binding values and dispatch in create_optimized_embedding_function to call lightrag.llm.voyageai.voyageai_embed.
  • Refactor/Cleanup:
    • Removes VoyageAI embedding code and numpy import from lightrag/llm/anthropic.py.
  • DX:
    • Improves unsupported embedding error message to include the invalid value.

Written by Cursor Bugbot for commit 9afc599. This will update automatically on new commits. Configure here.

@laszukdawid laszukdawid changed the title feat: voyageai embed support Explicit voyageai embed support Dec 5, 2025
@LarFii
Copy link
Collaborator

LarFii commented Dec 11, 2025

@cursor review

"aws_bedrock",
"jina",
"gemini",
"voyageai",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing voyageai in argparse embedding-binding choices

The PR adds voyageai to the validation list in lightrag_server.py but doesn't add it to the argparse choices list in config.py. The argparse configuration at config.py lines 242-250 restricts --embedding-binding to specific values that don't include voyageai. As a result, argparse will reject voyageai before reaching the server validation, making the new embedding binding unusable via command-line arguments or the EMBEDDING_BINDING environment variable.

Fix in Cursor Fix in Web

model=model,
api_key=api_key,
embedding_dim=embedding_dim,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Model passed unconditionally may override default with None

The voyageai binding passes model=model unconditionally, unlike other bindings (jina, openai, gemini) which check if model: before including it in kwargs. When args.embedding_model is None (user didn't set EMBEDDING_MODEL), this passes model=None to voyageai_embed, which overrides the function's default value of "voyage-3" and subsequently passes None to the VoyageAI API. The API likely requires a valid model name, causing the call to fail.

Fix in Cursor Fix in Web

@laszukdawid
Copy link
Author

@LarFii updated

@laszukdawid
Copy link
Author

@danielaskdd @LarFii ?

@danielaskdd
Copy link
Collaborator

Please refer to the openai_embed function and implement a max_token_size truncation mechanism for the VoyageAI and Anthropic embedding models.

@danielaskdd danielaskdd added the enhancement New feature or request label Dec 22, 2025
laszukdawid pushed a commit to laszukdawid/LightRAG that referenced this pull request Dec 28, 2025
This commit addresses all feedback from PR HKUDS#2484 review:

1. Fix model parameter override bug (Critical Bug HKUDS#2)
   - Add conditional model parameter passing in lightrag_server.py
   - Only pass model parameter if provided by user
   - Prevents None from overriding default "voyage-3" model
   - Mirrors pattern used by jina and openai bindings

2. Enable max_token_size truncation by default
   - Set truncation=True as default in voyageai_embed()
   - Prevents API errors when texts exceed token limits
   - Improves documentation for truncation parameter
   - Adds embedding_dim parameter documentation

Note: Critical Bug #1 (argparse config) was already fixed in commit 2752b01

Addresses feedback from:
- Cursor Bot review (model parameter bug)
- @danielaskdd (truncation mechanism request)

Related: PR HKUDS#2484 on HKUDS/LightRAG
@danielaskdd
Copy link
Collaborator

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2752b01f12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +855 to +858
return await actual_func(
texts,
api_key=api_key,
embedding_dim=embedding_dim,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pass embedding model to VoyageAI provider

The VoyageAI embedding dispatch never forwards the configured model name. This means EMBEDDING_MODEL is ignored for the voyageai binding, so calls always use the default voyage-3. In practice, setting EMBEDDING_MODEL=voyage-3-lite (and EMBEDDING_DIM=512) will still send the default model and then fail dimension validation because the wrapper expects 512 but the API returns 1024. Please pass model when calling voyageai_embed here.

Useful? React with 👍 / 👎.

Comment on lines +81 to +85
if not api_key:
api_key = os.environ.get("VOYAGEAI_API_KEY")
if not api_key:
logger.error("VOYAGEAI_API_KEY environment variable not set")
raise ValueError(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept legacy VOYAGE_API_KEY for VoyageAI embeddings

The new VoyageAI module only checks VOYAGEAI_API_KEY when no explicit api_key is provided. The previous implementation (removed from anthropic.py in this commit) used VOYAGE_API_KEY, so existing deployments that only set VOYAGE_API_KEY will now hit the ValueError here unless they also set EMBEDDING_BINDING_API_KEY or rename the env var. Consider falling back to VOYAGE_API_KEY to avoid a silent regression.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants