Skip to content

Conversation

chaserhkj
Copy link
Contributor

@chaserhkj chaserhkj commented Aug 20, 2025

Pull Request Type

  • ✨ feat
  • πŸ› fix
  • ♻️ refactor
  • πŸ’„ style
  • πŸ”¨ chore
  • πŸ“ docs

Relevant Issues

resolves #3570

What is in this change?

This PR adds a new configurable environment variable to add a small delay for dispatching each API request calls when using the generic OpenAI embedding engine.

This is meant to be used together with GENERIC_OPEN_AI_EMBEDDING_MAX_CONCURRENT_CHUNKS=1 and is mostly relevant to the use of local LLM backends like llama.cpp, where their API server implementation is naive and will just reject API requests with a 429 error if the underlying worker is busy.

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • Relevant documentation has been updated
  • I have tested my code functionality
  • Docker build succeeds locally

I am skipping documentation for now since there is no change to front end UI, but I could add it to this PR if requested.

@chaserhkj chaserhkj force-pushed the generic-openai-embedding-delay branch from 47ad24d to d19048c Compare September 12, 2025 22:58
@timothycarambat timothycarambat added the PR:needs review Needs review by core team label Sep 16, 2025
@timothycarambat
Copy link
Member

The previous change here would execute all the batches at once, and they would all just wait at the same time instead of doing each batch sequentially with a wait in between. Refactored code solves this and was tested.

@timothycarambat timothycarambat merged commit 226802d into Mintplex-Labs:master Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR:needs review Needs review by core team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEAT]: Configurable Delay Between Embedding Requests
2 participants