dstack
is an open-source framework for orchestrating GPU workloads
across multiple cloud GPU providers. It provides a simple cloud-agnostic interface for
development and deployment of generative AI models.
- [2023/09] Deploying LLMs using Python API (Example)
- [2023/09] Managed gateways (Release)
- [2023/08] Fine-tuning Llama 2 using QLoRA (Example)
- [2023/08] Deploying Stable Diffusion using FastAPI (Example)
- [2023/07] Deploying LLMS using TGI (Example)
- [2023/07] Deploying LLMS using vLLM (Example)
To use dstack
, install it with pip
, and start the server.
pip install "dstack[all]" -U
dstack start
Upon startup, the server sets up the default project called main
.
Prior to using dstack
, make sure to configure clouds.
Once the server is up, you can orchestrate GPU workloads using either the CLI or Python API.
The CLI allows you to define what you want to run as a YAML file and
run it via the dstack run
CLI command.
Configurations can be of three types: dev-environment
, task
, and service
.
A dev environment is a virtual machine with a pre-configured IDE.
type: dev-environment
python: "3.11" # (Optional) If not specified, your local version is used
setup: # (Optional) Executed once at the first startup
- pip install -r requirements.txt
ide: vscode
A task can be either a batch job, such as training or fine-tuning a model, or a web application.
type: task
python: "3.11" # (Optional) If not specified, your local version is used
ports:
- 7860
commands:
- pip install -r requirements.txt
- python app.py
While the task is running in the cloud, the CLI forwards its ports traffic to localhost
for convenient access.
A service is an application that is accessible through a public endpoint.
type: service
port: 7860
commands:
- pip install -r requirements.txt
- python app.py
Once the service is up, dstack
makes it accessible from the Internet through
the gateway.
To run a configuration, use the dstack run
command followed by
working directory and the path to the configuration file.
dstack run . -f text-generation-inference/serve.dstack.yml --gpu 80GB -y
RUN BACKEND INSTANCE SPOT PRICE STATUS SUBMITTED
tasty-zebra-1 lambda 200GB, 1xA100 (80GB) no $1.1 Submitted now
Privisioning...
Serving on https://tasty-zebra-1.mydomain.com
As an alternative to the CLI, you can run tasks and services programmatically via Python API.
import sys
import dstack
task = dstack.Task(
image="ghcr.io/huggingface/text-generation-inference:latest",
env={"MODEL_ID": "TheBloke/Llama-2-13B-chat-GPTQ"},
commands=[
"text-generation-launcher --trust-remote-code --quantize gptq",
],
ports=["8080:80"],
)
resources = dstack.Resources(gpu=dstack.GPU(memory="20GB"))
if __name__ == "__main__":
print("Initializing the client...")
client = dstack.Client.from_config(repo_dir="~/dstack-examples")
print("Submitting the run...")
run = client.runs.submit(configuration=task, resources=resources)
print(f"Run {run.name}: " + run.status())
print("Attaching to the run...")
run.attach()
# After the endpoint is up, http://127.0.0.1:8080/health will return 200 (OK).
try:
for log in run.logs():
sys.stdout.buffer.write(log)
sys.stdout.buffer.flush()
except KeyboardInterrupt:
print("Aborting the run...")
run.stop(abort=True)
finally:
run.detach()
For additional information and examples, see the following links: