Centralizing Rate Limits #427

whitead · 2024-09-17T07:17:45Z

I created an OpenAI project set at the tier 1 usage and tried to get liteLLM to work with lower rate limits. I centralized the Router to originate from the settings, but still am running into a lot of issues.

I've built a Router with

{
    "model_list": [
        {
            "model_name": "gpt-4o-2024-08-06",
            "litellm_params": {"model": "gpt-4o-2024-08-06", "temperature": 0.0},
            "rpm": 500,
            "tpm": 30000,
        }
    ],
    "num_retries": 3,
    "retry_after": 10,
    "enable_pre_call_checks": True,
}

The issues are that - with async concurrency:

The rate limits are not respected and we're getting 429s from OpenAI. LiteLLM also does not actually respect the message in 429 to wait the requested time (e.g., retry in 3.5 seconds)
The logs are full of "Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new" regardless of my log settings
Even if it did work, we have a fundamental mismatch in usage. Because of how gpt-4o rate limits work, we need to share rate limits across multiple models - but litellm will only share across deployments (which are keyed to one model or group of models). This is the opposite of what we need.

I reproduced these errors in the min file router_test.py which triggers 429, even though the rate limits are set to match my OpenAI rate limits.

Opening this PR to start discussion on how to proceed

whitead · 2024-09-28T05:32:09Z

Closing in favor of new plan

whitead added 3 commits September 17, 2024 00:01

Tried to solve rate limits

ceb2d49

fixed repro script

d20f0d4

Made script actually blow up

6a01c30

whitead closed this Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralizing Rate Limits #427

Centralizing Rate Limits #427

whitead commented Sep 17, 2024

whitead commented Sep 28, 2024

Centralizing Rate Limits #427

Centralizing Rate Limits #427

Conversation

whitead commented Sep 17, 2024

whitead commented Sep 28, 2024