Skip to content

Use Gunicorn + Uvicorn to manage workers in llama stack on Unix systems #3883

@iamemilio

Description

@iamemilio

🚀 Describe the new functionality needed

Reference Blog: https://medium.com/@iklobato/mastering-gunicorn-and-uvicorn-the-right-way-to-deploy-fastapi-applications-aaa06849841e

In llamastack/llama_stack/cli/stack/run.py:

uvicorn.run("llama_stack.core.server.server:create_app", **uvicorn_config)

This section of code should be able to initialize the llama stack server with Gunicorn in Unix operating system environments

💡 Why is this needed? What if we don't build it?

Fast API can handle more concurrent workloads when using Gunicorn as the process manager with Uvicorn workers. This will not change any of the functional behavior of llama stack, other than the production performance. Running in this deployment pattern is also known to more easily integrate with open telemetry auto instrumentation without issue but is no longer a blocker for it in llama stack at the moment.

Other thoughts

None

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions