Serve and Deploy LLMs

This document shows how you can serve a LitGPT for deployment.

Serve an LLM

This section illustrates how we can set up an inference server for a phi-2 LLM using litgpt serve that is minimal and highly scalable.

Step 1: Start the inference server

# 1) Download a pretrained model (alternatively, use your own finetuned model)
litgpt download --repo_id microsoft/phi-2

# 2) Start the server
litgpt serve --checkpoint_dir checkpoints/microsoft/phi-2

Tip

Use litgpt serve --help to display additional options, including the port, devices, LLM temperature setting, and more.

Step 2: Query the inference server

You can now send requests to the inference server you started in step 2. For example, in a new Python session, we can send requests to the inference server as follows:

import requests, json

response = requests.post(
    "http://127.0.0.1:8000/predict", 
    json={"prompt": "Fix typos in the following sentence: Exampel input"}
)

print(response.json()["output"])

Executing the code above prints the following output:

Instruct: Fix typos in the following sentence: Exampel input
Output: Example input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deploy.md

deploy.md

Serve and Deploy LLMs

Serve an LLM

Step 1: Start the inference server

Step 2: Query the inference server

Files

deploy.md

Latest commit

History

deploy.md

File metadata and controls

Serve and Deploy LLMs

Serve an LLM

Step 1: Start the inference server

Step 2: Query the inference server