Skip to content

Conversation

sriramsowmithri9807
Copy link
Contributor

@sriramsowmithri9807 sriramsowmithri9807 commented Jun 13, 2025

Description (issue solved #1367)

This PR adds support for using different endpoints for LLM inference and embedding models, which is particularly useful when running separate llama.cpp servers for each function.

Changes Made

  • Added LLM_ENDPOINT and EMBEDDING_ENDPOINT configuration options
  • Updated GenericLLMProvider to handle custom base URLs for different providers
  • Enhanced embedding initialization to support separate endpoints
  • Improved configuration handling for both LLM and embedding providers
  • Added proper environment variable support for endpoint configuration

How to Test

  1. Set up your environment variables:
    export LLM_ENDPOINT="http://localhost:8080/v1"
    export EMBEDDING_ENDPOINT="http://localhost:8081/v1"
    export FAST_LLM="openai:llama3"  # or your preferred model
    export EMBEDDING="openai:llama-embed"  # or your preferred embedding model
    

Or update your config file:

python

{
    "LLM_ENDPOINT": "http://localhost:8080/v1",
    "EMBEDDING_ENDPOINT": "http://localhost:8081/v1",
    "FAST_LLM": "openai:llama3",
    "EMBEDDING": "openai:llama-embed"
}

sriramsowmithri9807 and others added 2 commits June 13, 2025 17:40
- Added LLM_ENDPOINT and EMBEDDING_ENDPOINT configuration options
- Updated GenericLLMProvider to handle custom base URLs
- Enhanced embedding initialization to use separate endpoints
- Improved configuration handling for both LLM and embedding providers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant