DALL-E compatible image generation endpoint #292

dougbtv · 2025-12-11T17:54:25Z

quick overview.

This introduces a /v1/images/generations OpenAI API endpoint, intended to follow the DALL-E compatible endpoint. This enables serving diffusion models through an OpenAI compatible API.

This is in addition to generating diffusion outputs using the completions API, and following the methodology defined and merged in the diffusion online serving PR #259

cc: @fake0fan (thanks for getting the work off to a great start in 259!)

Example client implementation @ https://github.com/dougbtv/comfyui-vllm-omni/

review tips.

When reviewing, I recommend going by commit, and see the changes broken into:

[docs]
[testing]
[feature]

so you can isolate just the changes / tests / docs during your review.

design thoughts.

The idea here is to build on the async API endpoint work that fake0fan did with using the openai completions endpoint, but, to add a diffusion endpoint.

The thought is to add the endpoint, but also a mapping for adding new model support for the endpoint, so that it can be tuned.

The API endpoints are more easily validated, using Pydantic, than the inlined parameters in the completions string. While I believe that is a reasonable action to expect image generation from a completions endpoint while serving multi-modal models, I think it would be nice to have an API endpoint where the parameters can be validated.

...and I want to use it!

overview.

[Feature] Add OpenAI DALL-E compatible image generation API

Builds on @fake0fan's diffusion online serving implementation to provide
a production-ready, OpenAI-compatible image generation API. Implements
the DALL-E /v1/images/generations endpoint with full async support and
proper error handling.

This implementation focuses on generation-only (not editing) to keep
the initial PR manageable while maintaining full functionality and
extensibility.

OpenAI DALL-E API Compatibility:

/v1/images/generations - Text-to-image generation
Full compatibility with OpenAI Python SDK
Request/response formats match DALL-E specification

Unified Async Server:

Single vllm serve <model> --omni command for all diffusion models
Async AsyncOmniDiffusion engine with thread-pool execution
Exposes both /v1/images/generations and /v1/chat/completions
Automatic model type detection (diffusion vs chat)

Model Support (via Model Profiles):

Qwen/Qwen-Image (text-to-image with true CFG, 50 steps default)
Tongyi-MAI/Z-Image-Turbo (fast generation, 9 steps default)
Model profiles encapsulate per-model defaults and constraints
Easy to add new models without changing API code

Features:

Pydantic validation for all request parameters
Comprehensive error handling with proper HTTP status codes
Model field validation and empty prompt validation
Response format validation (b64_json only)
Prompt logging at debug level (security-conscious)
Model-specific parameter enforcement (e.g., Z-Image forces CFG=0)

Implementation Files:

image_api_utils.py - Shared helper functions (parse_size, encode_image, etc.)
image_model_profiles.py - Model-specific configurations and constraints
protocol/images.py - Pydantic models for requests/responses
api_server.py - DALL-E endpoint implementation (/v1/images/generations)

Modified:

api_server.py - Integrated DALL-E endpoint with async support
async_diffusion.py - Import ordering fix

Built on @fake0fan's excellent diffusion online serving work. This PR
adds the DALL-E compatible API layer with full validation, error
handling, and production-ready features while keeping the scope focused
on generation to facilitate review.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-11T18:00:52Z

vllm_omni/entrypoints/async_diffusion.py

+        return OmniRequestOutput.from_diffusion(
+            request_id=request_id,
+            images=images,
+            prompt=prompt,
+            metrics={


Propagate diffusion errors instead of returning empty success

If DiffusionEngine.step() raises (e.g., during preprocessing/postprocessing) it returns None, but AsyncOmniDiffusion.generate() doesn’t treat that as a failure—it always falls through to build an OmniRequestOutput and the HTTP handlers will return 200 with an empty data array. That masks generation failures and gives clients a successful response even when no images were produced. This path should detect a None/empty result and surface an error instead of returning success.

Useful? React with 👍 / 👎.

dougbtv · 2025-12-11T23:12:50Z

We decided in the maintainer's call, with helpful input from Roger Wang (thank you!) to first start with a single endpoint, for v/1/images/generation -- I'll put together that as a next iteration

dougbtv · 2025-12-12T21:42:22Z

alright -- I've gone ahead with a refactor on this PR to address comments from Thursday's maintainer's call.

Basically the gist is that I reduced this down to just the /v1/images/generations endpoint and removed the image edit endpoint. There's still a lot to make for the basis of the single endpoint, and there's also a lot of testing and docs.

So I broke it out into three commits, with commit messages like [docs], [tests], [feature] so that it's a little easier to review.

appreciate the input!

hsliuustc0106

please also align with #274

hsliuustc0106 · 2025-12-13T23:28:34Z

@gcanlin I think this is related to #197, PTAL

dougbtv · 2025-12-15T19:48:54Z

I've got the branch rebased on main, and I've incorporated the style used in #274 for documentation in my docs update, thanks for letting me know!

gcanlin · 2025-12-17T12:16:02Z

#259 has been merged now. Could you please rebase this PR on the newest main? Thanks!

dougbtv · 2025-12-17T14:00:51Z

Thanks @gcanlin ! I've rebased this, and corrected the failing docs build.

I'm ready for the ready tag -- I ran the tests externally with my own docker build, and I have the (passing) results posted in this gist: https://gist.github.com/dougbtv/9c28ad28cce839610c124330eb25bb4f

Appreciate any review and thanks for keeping track of me PR, and the updates.

ZJY0516 · 2025-12-17T14:04:38Z

@dougbtv You need to update the pipeline.yml to run your test in CI

dougbtv · 2025-12-17T14:13:13Z

Thank you! Updated to include a Image Generation API Test in .buildkite/pipeline.yml

ZJY0516 · 2025-12-17T14:16:22Z

vllm_omni/entrypoints/openai/image_model_profiles.py

@@ -0,0 +1,123 @@
+# SPDX-License-Identifier: Apache-2.0


Could you explain why we need this file?

My thinking here was to make an extensible way to add new models which might have differing requirements.

So for example.... z-image turbo only works with up to 16 steps before it breaks. the true_cfg_scale only exists for qwen-image between these two.

I'd expect that as more models are added, we might need more ways to specify what each models is capable of.

Otherwise, I would've had to inline those kind of rules into api_server.py.

ZJY0516 · 2025-12-17T14:17:05Z

vllm_omni/entrypoints/openai/image_api_utils.py

+            detail=f"Invalid image format: {file.content_type}. Supported: PNG, JPEG, WebP",
+        )
+
+    file.file.seek(0, 2)


what does this mean?

I was trying to get file size without loading the whole file into memory, But... It's vague, let me see if I can improve it...

Turns out this was cruft from image editing endpoint, removing!

ZJY0516 · 2025-12-17T14:18:16Z

vllm_omni/entrypoints/openai/protocol/images.py

+from pydantic import BaseModel, Field, field_validator
+
+
+class ImageSize(str, Enum):


Why we only support these resolutions?

This I think is cruft from my initial PoC implementation! Checking... thanks.

removed. I had built it with recommended sizes for qwen in my initial PoC, but I do think those should just be documented guidelines and not enforced / validated generally. thanks!

dougbtv · 2025-12-17T14:47:11Z

Pushed again to address a pre-commit issue.

dougbtv · 2025-12-17T15:58:37Z

docs/user_guide/examples/online_serving/image_generation_api.md

+
+The server automatically enables VAE slicing and tiling for memory optimization.
+
+### Invalid Size Format


This is obsolete

removed section after above fixes.

Add comprehensive documentation for the OpenAI DALL-E compatible image generation API with inline examples and model profiles. Signed-off-by: dougbtv <[email protected]>

Add 29 comprehensive tests covering generation endpoints, model profiles, request validation, and error handling. Signed-off-by: dougbtv <[email protected]>

Implement /v1/images/generations endpoint with: - AsyncOmniDiffusion integration for text-to-image generation - Model profile system for per-model defaults and constraints - Request/response protocol matching OpenAI DALL-E API - Support for Qwen-Image and Z-Image-Turbo models Signed-off-by: dougbtv <[email protected]>

dougbtv requested a review from hsliuustc0106 as a code owner December 11, 2025 17:54

dougbtv mentioned this pull request Dec 11, 2025

[Entrypoints] Support Online Serving for Diffusion-only Models #259

Merged

chatgpt-codex-connector bot reviewed Dec 11, 2025

View reviewed changes

dougbtv force-pushed the dalle-compat-image-api branch 3 times, most recently from 48fee5a to 65ab272 Compare December 11, 2025 22:41

gcanlin mentioned this pull request Dec 12, 2025

[Feature]: API - OpenAI API for image generation #197

Open

1 task

dougbtv marked this pull request as draft December 12, 2025 17:46

dougbtv force-pushed the dalle-compat-image-api branch 6 times, most recently from 3cf6521 to c981e03 Compare December 12, 2025 21:25

dougbtv changed the title ~~DALL-E compatible image generation (and editing) endpoints~~ DALL-E compatible image generation endpoint Dec 12, 2025

dougbtv force-pushed the dalle-compat-image-api branch from c981e03 to f540b9e Compare December 12, 2025 21:35

dougbtv marked this pull request as ready for review December 12, 2025 21:37

hsliuustc0106 reviewed Dec 13, 2025

View reviewed changes

dougbtv force-pushed the dalle-compat-image-api branch 2 times, most recently from ffe73eb to a434a65 Compare December 15, 2025 19:46

dougbtv force-pushed the dalle-compat-image-api branch 3 times, most recently from 925d687 to 02f3c8e Compare December 16, 2025 18:45

dougbtv force-pushed the dalle-compat-image-api branch 2 times, most recently from 1c5866b to 834957f Compare December 17, 2025 13:34

dougbtv force-pushed the dalle-compat-image-api branch from 834957f to eab5dfb Compare December 17, 2025 13:52

dougbtv force-pushed the dalle-compat-image-api branch from eab5dfb to cdab3c1 Compare December 17, 2025 14:09

ZJY0516 reviewed Dec 17, 2025

View reviewed changes

dougbtv force-pushed the dalle-compat-image-api branch 3 times, most recently from f6b0e49 to 5ca1cb5 Compare December 17, 2025 14:47

dougbtv commented Dec 17, 2025

View reviewed changes

dougbtv added 3 commits December 17, 2025 11:00

[Docs] Add image generation API documentation

741587b

Add comprehensive documentation for the OpenAI DALL-E compatible image generation API with inline examples and model profiles. Signed-off-by: dougbtv <[email protected]>

[Tests] Add comprehensive test suite for image generation API

c4b48ea

Add 29 comprehensive tests covering generation endpoints, model profiles, request validation, and error handling. Signed-off-by: dougbtv <[email protected]>

dougbtv force-pushed the dalle-compat-image-api branch from 5ca1cb5 to 4f4caed Compare December 17, 2025 16:00

		from pydantic import BaseModel, Field, field_validator


		class ImageSize(str, Enum):


		The server automatically enables VAE slicing and tiling for memory optimization.

		### Invalid Size Format

DALL-E compatible image generation endpoint #292

Are you sure you want to change the base?

DALL-E compatible image generation endpoint #292

Uh oh!

Conversation

dougbtv commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

quick overview.

review tips.

design thoughts.

overview.

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

dougbtv commented Dec 11, 2025

Uh oh!

dougbtv commented Dec 12, 2025

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Dec 13, 2025

Uh oh!

dougbtv commented Dec 15, 2025

Uh oh!

gcanlin commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dougbtv commented Dec 17, 2025

Uh oh!

ZJY0516 commented Dec 17, 2025

Uh oh!

dougbtv commented Dec 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dougbtv commented Dec 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dougbtv commented Dec 11, 2025 •

edited

Loading

gcanlin commented Dec 17, 2025 •

edited

Loading