Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Deep Infra to known endpoints #4281

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

erkserkserks
Copy link

Summary

Add Deep Infra to known endpoints

Motivation: qwen2.5-72b-instruct is a strong model that is competitive with proprietary models in coding: https://livebench.ai/

Deep Infra is the currently the cheapest way to run this model and many open weight models:
https://openrouter.ai/models/qwen/qwen-2.5-72b-instruct/providers
https://openrouter.ai/models/meta-llama/llama-3.1-405b-instruct/providers
https://openrouter.ai/models/meta-llama/llama-3.2-90b-vision-instruct

Sample custom endpoint in librechat.yaml:

endpoints:
  custom:
    # Deep Infra
    - name: 'DeepInfra'
      apiKey: '${DEEPINFRA_API_KEY}'
      baseURL: 'https://api.deepinfra.com/v1/openai/'
      models:
        default:
          [
            'Qwen/Qwen2.5-72B-Instruct'
          ]
        fetch: false
      titleConvo: true
      titleModel: 'meta-llama/Llama-3.2-3B-Instruct'
      summarize: false
      summaryModel: 'Qwen/Qwen2.5-72B-Instruct'
      forcePrompt: false
      modelDisplayLabel: "Qwen"
DeepInfra Example

Change Type

  • New feature (non-breaking change which adds functionality)

Testing

  1. Add Deep Infra custom endpoint to librechat.yaml
  2. npm run frontend; npm run backend
  3. Visit LibreChat, add Deep Infra API key, test chat feature

Checklist

  • My code adheres to this project's style guidelines
  • I have performed a self-review of my own code
  • My changes do not introduce new warnings

@erkserkserks
Copy link
Author

Hi @danny-avila could you please review this?

I have been using OpenRouter for open weight LLMs, and I have noticed that the majority of my requests are routed to DeepInfra. Since OpenRouter load balances based by price, DeepInfra is probably one of the most popular providers on OpenRouter.

I believe the DeepInfra meets the threshold for being a notable provider:

  1. A large amount (the majority?) of open weight model requests on OpenRouter get routed to DeepInfra
  2. TypingMind supports a limited number of providers, and DeepInfra is supported natively: https://docs.typingmind.com/chat-models-settings/use-with-deepinfra
  3. Peer providers like Groq consider DeepInfra to be a competitor. Here's an image from Groq's homepage:
    AA-Speed-Llama3_1-70B

It would be great to add the ability use DeepInfra directly because of improved pricing, lower latency, and data privacy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant