Provide an offline engine API #1567

ByronHsu · 2024-10-04T08:05:47Z

Motivation

This PR is to support "Add APIs for using the inference engine in a single script without launching a separate server" in #1487. This is a simplified version of #1127 @JianyuZhan where I reuse most of the existing code.

Modifications

Context

The current SRT server consists of an HTTP server and the SRT engine.

HTTP server: A FastAPI server that routes requests to the engine.
SRT engine:
1. Tokenizer Manager: Tokenizes the requests and sends them to the controller.
2. Controller (subprocess): Receives requests from the Tokenizer Manager, schedules batches, forwards them, and sends the output tokens to the Detokenizer Manager.
3. Detokenizer Manager (subprocess): Detokenizes the output tokens and sends the result back to the Tokenizer.

HTTP server and Tokenizer Manager are both running in the main process, but there is no way to decouple them and only instantiate Tokenizer Manager.

Decouple SRT engine and HTTP server

This PR introduces SRT engine by decoupling launch_server to launch_server and launch_engine.

launch_server: launch_engine + HTTP server creation, used by SRT Runtime and standalone server.
launch_engine: SRT Engine creation, used by SRT engine.

New public API: `Engine`

Uplift Engine to the top level, so users can easily call with sgl.Engine

Engine Usage Example

Same settings as vllm but use SRT Engine.

import sglang as sgl

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = {"temperature": 0.8, "top_p": 0.95}

# Create an LLM.
llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

===============================
Prompt: Hello, my name is
Generated text:  Alistair. I am an independent game developer.

I am currently working on a virtual reality game that focuses on exploring the world and discovering new things. The game has a unique art style that combines elements of both photorealism and cartoonish art. The game will have no story, but rather focus on the exploration of the world and the various things that can be found within.

I am looking for a 3D artist who can help me bring my vision to life. The ideal candidate will have experience in creating environments, characters, and objects in Unity. They should also have a strong
===============================
Prompt: The president of the United States is
Generated text:  Donald Trump. The president of India is Narendra Modi. The president of China is Xi Jinping. The president of Russia is Vladimir Putin. The president of France is Emmanuel Macron. The president of the United Kingdom is Boris Johnson. The president of Canada is Justin Trudeau. The president of Australia is Scott Morrison. The president of Japan is Shinzo Abe. The president of South Korea is Moon Jae-in. The president of Brazil is Jair Bolsonaro. The president of Colombia is Ivan Duque. The president of Argentina is Mauricio Macri
===============================
Prompt: The capital of France is
Generated text:  Paris. Paris is one of the world’s leading cities and is known for its history, art, culture, architecture, and gastronomy. Paris is located in the north-central region of France and is surrounded by the Seine River, which flows through the city. The city is divided into twenty districts or arrondissements, each with its own unique character and charm. Some of the most famous landmarks in Paris include the Eiffel Tower, the Louvre Museum, Notre-Dame Cathedral, the Arc de Triomphe, and the Champs-Élysées. Paris is also known
===============================
Prompt: The future of AI is
Generated text:  bright, and its potential to revolutionize various industries is immense. Here are some ways in which AI can impact the future of healthcare:

1. Personalized Medicine: AI can help in developing personalized treatment plans for patients based on their genetic makeup, medical history, and lifestyle factors. This can lead to more effective treatments and better outcomes for patients.
2. Diagnosis and Treatment: AI-powered tools can help healthcare providers diagnose diseases more accurately and quickly. They can also help in selecting the most appropriate treatment options based on the patient's condition.
3. Drug Development: AI can help in acceler

Discussion

One caveat is that we construct ServerArgs, but the HTTP server related args will not be used. I think this is ok because ServerArgs is the superset of Engine Args, so it can cover everything.

Testing

Add test_srt_engine.py, which runs batch inference and assert the answer.

TODO

Add async generate
Add encode

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

zhyncs · 2024-10-05T03:58:09Z

I think async generate is needed. QQ What does add decode in todo 2 refer to?

ByronHsu · 2024-10-05T04:02:10Z

@zhyncs oh i mean encode like the one in Runtime. It was a typo

ByronHsu · 2024-10-05T04:04:20Z

@zhyncs i can do the async gen in the next PR

test/lang/test_srt_engine.py

python/sglang/api.py

python/sglang/srt/server.py

examples/frontend_language/usage/srt_engine.py

python/sglang/srt/server.py

test/lang/run_suite.py

examples/runtime/srt_engine.py

ByronHsu · 2024-10-06T17:33:33Z

Please don't merge now. Consistency test is failing on H100 in CI, but passing on my A100

test/srt/test_srt_engine.py

python/sglang/srt/server.py

imadoualid · 2024-10-13T11:46:51Z

hey guys i'm getting the AttributeError: module 'sglang' has no attribute 'Engine' on sgl '0.3.2' still not in prod ?

ByronHsu · 2024-10-13T16:16:05Z

@imadoualid the changes should be in the main HEAD if you can build from source.

ByronHsu force-pushed the byhsu/decouple-engine-with-server branch from 57f9cd8 to 94346ea Compare October 4, 2024 08:10

ByronHsu marked this pull request as ready for review October 4, 2024 08:13

Ying1123 reviewed Oct 5, 2024

View reviewed changes

test/lang/test_srt_engine.py Outdated Show resolved Hide resolved

Ying1123 reviewed Oct 5, 2024

View reviewed changes

test/lang/test_srt_engine.py Outdated Show resolved Hide resolved

ByronHsu force-pushed the byhsu/decouple-engine-with-server branch 2 times, most recently from 1bfd171 to c080de1 Compare October 5, 2024 23:14

merrymercy mentioned this pull request Oct 6, 2024

[RFC] Add an LLM engine #1127

Closed

4 tasks

merrymercy reviewed Oct 6, 2024

View reviewed changes

python/sglang/api.py Outdated Show resolved Hide resolved

merrymercy requested changes Oct 6, 2024

View reviewed changes

python/sglang/srt/server.py Outdated Show resolved Hide resolved

examples/frontend_language/usage/srt_engine.py Outdated Show resolved Hide resolved

Ying1123 reviewed Oct 6, 2024

View reviewed changes

examples/frontend_language/usage/srt_engine.py Outdated Show resolved Hide resolved

merrymercy mentioned this pull request Oct 6, 2024

Development Roadmap (2024 Q4) #1487

Open

33 tasks

Ying1123 reviewed Oct 6, 2024

View reviewed changes

python/sglang/srt/server.py Outdated Show resolved Hide resolved

ByronHsu force-pushed the byhsu/decouple-engine-with-server branch 2 times, most recently from 3d92605 to 34a6c2e Compare October 6, 2024 06:28

merrymercy reviewed Oct 6, 2024

View reviewed changes

python/sglang/srt/server.py Show resolved Hide resolved

python/sglang/srt/server.py Outdated Show resolved Hide resolved

test/lang/run_suite.py Outdated Show resolved Hide resolved

ByronHsu force-pushed the byhsu/decouple-engine-with-server branch from 005e5e0 to b18b447 Compare October 6, 2024 07:19

merrymercy reviewed Oct 6, 2024

View reviewed changes

examples/runtime/srt_engine.py Show resolved Hide resolved

merrymercy approved these changes Oct 6, 2024

View reviewed changes

ByronHsu force-pushed the byhsu/decouple-engine-with-server branch from 66d5402 to 22c7e3e Compare October 6, 2024 17:06

ByronHsu added 7 commits October 6, 2024 22:03

Decouple engine with server and provide an engine API

f1371c0

update example code

b509312

add srt engine to minimal suite

8d63da2

tmp

9d479af

tmp

1de3b80

tmp

0b9725a

address comments

e296803

ByronHsu added 7 commits October 6, 2024 22:03

cleanup

b60a94f

add consistency test

6862580

lm comment

abe4fb8

wrap main

c579494

let engine test run sooner

f1f630e

tmp

3169b76

remove pytorch backend

61f17a2

ByronHsu force-pushed the byhsu/decouple-engine-with-server branch from 1a49352 to 61f17a2 Compare October 6, 2024 22:03

cleanup

3cb03b3

merrymercy reviewed Oct 7, 2024

View reviewed changes

test/srt/test_srt_engine.py Outdated Show resolved Hide resolved

test/srt/test_srt_engine.py Outdated Show resolved Hide resolved

merrymercy added 2 commits October 6, 2024 19:05

Update test/srt/test_srt_engine.py

c796e89

Update test/srt/test_srt_engine.py

255de44

merrymercy reviewed Oct 7, 2024

View reviewed changes

test/srt/test_srt_engine.py Outdated Show resolved Hide resolved

Update test/srt/test_srt_engine.py

8d07818

merrymercy reviewed Oct 7, 2024

View reviewed changes

python/sglang/srt/server.py Show resolved Hide resolved

merrymercy approved these changes Oct 7, 2024

View reviewed changes

merrymercy enabled auto-merge (squash) October 7, 2024 03:02

merrymercy disabled auto-merge October 7, 2024 03:02

merrymercy enabled auto-merge (squash) October 7, 2024 03:03

merrymercy changed the title ~~Decouple engine with server and provide an engine API~~ Provide an offline engine API Oct 7, 2024

merrymercy disabled auto-merge October 7, 2024 03:26

merrymercy merged commit 551a3a9 into sgl-project:main Oct 7, 2024
11 checks passed

ByronHsu deleted the byhsu/decouple-engine-with-server branch October 13, 2024 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide an offline engine API #1567

Provide an offline engine API #1567

ByronHsu commented Oct 4, 2024 •

edited by Ying1123

Loading

zhyncs commented Oct 5, 2024

ByronHsu commented Oct 5, 2024

ByronHsu commented Oct 5, 2024

ByronHsu commented Oct 6, 2024 •

edited

Loading

imadoualid commented Oct 13, 2024

ByronHsu commented Oct 13, 2024

Provide an offline engine API #1567

Provide an offline engine API #1567

Conversation

ByronHsu commented Oct 4, 2024 • edited by Ying1123 Loading

Motivation

Modifications

Context

Decouple SRT engine and HTTP server

New public API: Engine

Engine Usage Example

Discussion

Testing

TODO

Checklist

zhyncs commented Oct 5, 2024

ByronHsu commented Oct 5, 2024

ByronHsu commented Oct 5, 2024

ByronHsu commented Oct 6, 2024 • edited Loading

imadoualid commented Oct 13, 2024

ByronHsu commented Oct 13, 2024

ByronHsu commented Oct 4, 2024 •

edited by Ying1123

Loading

New public API: `Engine`

ByronHsu commented Oct 6, 2024 •

edited

Loading