-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Explicit voyageai embed support #2484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Explicit voyageai embed support #2484
Conversation
|
@cursor review |
| "aws_bedrock", | ||
| "jina", | ||
| "gemini", | ||
| "voyageai", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing voyageai in argparse embedding-binding choices
The PR adds voyageai to the validation list in lightrag_server.py but doesn't add it to the argparse choices list in config.py. The argparse configuration at config.py lines 242-250 restricts --embedding-binding to specific values that don't include voyageai. As a result, argparse will reject voyageai before reaching the server validation, making the new embedding binding unusable via command-line arguments or the EMBEDDING_BINDING environment variable.
| model=model, | ||
| api_key=api_key, | ||
| embedding_dim=embedding_dim, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Model passed unconditionally may override default with None
The voyageai binding passes model=model unconditionally, unlike other bindings (jina, openai, gemini) which check if model: before including it in kwargs. When args.embedding_model is None (user didn't set EMBEDDING_MODEL), this passes model=None to voyageai_embed, which overrides the function's default value of "voyage-3" and subsequently passes None to the VoyageAI API. The API likely requires a valid model name, causing the call to fail.
|
@LarFii updated |
|
Please refer to the openai_embed function and implement a max_token_size truncation mechanism for the VoyageAI and Anthropic embedding models. |
This commit addresses all feedback from PR HKUDS#2484 review: 1. Fix model parameter override bug (Critical Bug HKUDS#2) - Add conditional model parameter passing in lightrag_server.py - Only pass model parameter if provided by user - Prevents None from overriding default "voyage-3" model - Mirrors pattern used by jina and openai bindings 2. Enable max_token_size truncation by default - Set truncation=True as default in voyageai_embed() - Prevents API errors when texts exceed token limits - Improves documentation for truncation parameter - Adds embedding_dim parameter documentation Note: Critical Bug #1 (argparse config) was already fixed in commit 2752b01 Addresses feedback from: - Cursor Bot review (model parameter bug) - @danielaskdd (truncation mechanism request) Related: PR HKUDS#2484 on HKUDS/LightRAG
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2752b01f12
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return await actual_func( | ||
| texts, | ||
| api_key=api_key, | ||
| embedding_dim=embedding_dim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass embedding model to VoyageAI provider
The VoyageAI embedding dispatch never forwards the configured model name. This means EMBEDDING_MODEL is ignored for the voyageai binding, so calls always use the default voyage-3. In practice, setting EMBEDDING_MODEL=voyage-3-lite (and EMBEDDING_DIM=512) will still send the default model and then fail dimension validation because the wrapper expects 512 but the API returns 1024. Please pass model when calling voyageai_embed here.
Useful? React with 👍 / 👎.
| if not api_key: | ||
| api_key = os.environ.get("VOYAGEAI_API_KEY") | ||
| if not api_key: | ||
| logger.error("VOYAGEAI_API_KEY environment variable not set") | ||
| raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accept legacy VOYAGE_API_KEY for VoyageAI embeddings
The new VoyageAI module only checks VOYAGEAI_API_KEY when no explicit api_key is provided. The previous implementation (removed from anthropic.py in this commit) used VOYAGE_API_KEY, so existing deployments that only set VOYAGE_API_KEY will now hit the ValueError here unless they also set EMBEDDING_BINDING_API_KEY or rename the env var. Consider falling back to VOYAGE_API_KEY to avoid a silent regression.
Useful? React with 👍 / 👎.
Description
Adds explicit support for VoyageAI embedding. These embedding functions were already in the code, under
anthropic.py, but there wasn't explicit support for using them viaEMBEDDING_MODEL.Related Issues
None that I've seen.
Changes Made
voyageailogic intovoyageai.pyfromanthropic.pyvoyageaiasargs.embedding_modelvalueChecklist
Additional Notes
N/A
Note
Adds first-class VoyageAI embeddings with a new provider module and integrates it into server binding/dispatch; removes VoyageAI code from Anthropic.
lightrag/llm/voyageai.pywithvoyageai_embed(retry, AsyncClient) wrapped viawrap_embedding_func_with_attrs(embedding_dim=1024,max_token_size=16000) and helperget_available_embedding_models."voyageai"to supportedembedding_bindingvalues and dispatch increate_optimized_embedding_functionto calllightrag.llm.voyageai.voyageai_embed.numpyimport fromlightrag/llm/anthropic.py.Written by Cursor Bugbot for commit 9afc599. This will update automatically on new commits. Configure here.