You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometime I reached max token limitation for batch workload
I want this feature enable large number of requests to LLM for concurrency.
For usage scenario, there is some rate limit for single model, can we config a model pool and load balancer for LLM requests to support large number of requests concurrency.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Sometime I reached max token limitation for batch workload
I want this feature enable large number of requests to LLM for concurrency.
For usage scenario, there is some rate limit for single model, can we config a model pool and load balancer for LLM requests to support large number of requests concurrency.
Beta Was this translation helpful? Give feedback.
All reactions