-
Notifications
You must be signed in to change notification settings - Fork 130
DALL-E compatible image generation endpoint #292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return OmniRequestOutput.from_diffusion( | ||
| request_id=request_id, | ||
| images=images, | ||
| prompt=prompt, | ||
| metrics={ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propagate diffusion errors instead of returning empty success
If DiffusionEngine.step() raises (e.g., during preprocessing/postprocessing) it returns None, but AsyncOmniDiffusion.generate() doesn’t treat that as a failure—it always falls through to build an OmniRequestOutput and the HTTP handlers will return 200 with an empty data array. That masks generation failures and gives clients a successful response even when no images were produced. This path should detect a None/empty result and surface an error instead of returning success.
Useful? React with 👍 / 👎.
48fee5a to
65ab272
Compare
|
We decided in the maintainer's call, with helpful input from Roger Wang (thank you!) to first start with a single endpoint, for v/1/images/generation -- I'll put together that as a next iteration |
3cf6521 to
c981e03
Compare
c981e03 to
f540b9e
Compare
|
alright -- I've gone ahead with a refactor on this PR to address comments from Thursday's maintainer's call. Basically the gist is that I reduced this down to just the /v1/images/generations endpoint and removed the image edit endpoint. There's still a lot to make for the basis of the single endpoint, and there's also a lot of testing and docs. So I broke it out into three commits, with commit messages like [docs], [tests], [feature] so that it's a little easier to review. appreciate the input! |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also align with #274
ffe73eb to
a434a65
Compare
|
I've got the branch rebased on main, and I've incorporated the style used in #274 for documentation in my docs update, thanks for letting me know! |
925d687 to
02f3c8e
Compare
|
#259 has been merged now. Could you please rebase this PR on the newest main? Thanks! |
1c5866b to
834957f
Compare
834957f to
eab5dfb
Compare
|
Thanks @gcanlin ! I've rebased this, and corrected the failing docs build. I'm ready for the Appreciate any review and thanks for keeping track of me PR, and the updates. |
|
@dougbtv You need to update the pipeline.yml to run your test in CI |
eab5dfb to
cdab3c1
Compare
|
Thank you! Updated to include a |
| @@ -0,0 +1,123 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why we need this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking here was to make an extensible way to add new models which might have differing requirements.
So for example.... z-image turbo only works with up to 16 steps before it breaks. the true_cfg_scale only exists for qwen-image between these two.
I'd expect that as more models are added, we might need more ways to specify what each models is capable of.
Otherwise, I would've had to inline those kind of rules into api_server.py.
| detail=f"Invalid image format: {file.content_type}. Supported: PNG, JPEG, WebP", | ||
| ) | ||
|
|
||
| file.file.seek(0, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to get file size without loading the whole file into memory, But... It's vague, let me see if I can improve it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this was cruft from image editing endpoint, removing!
| from pydantic import BaseModel, Field, field_validator | ||
|
|
||
|
|
||
| class ImageSize(str, Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we only support these resolutions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I think is cruft from my initial PoC implementation! Checking... thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed. I had built it with recommended sizes for qwen in my initial PoC, but I do think those should just be documented guidelines and not enforced / validated generally. thanks!
f6b0e49 to
5ca1cb5
Compare
|
Pushed again to address a pre-commit issue. |
|
|
||
| The server automatically enables VAE slicing and tiling for memory optimization. | ||
|
|
||
| ### Invalid Size Format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is obsolete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed section after above fixes.
Add comprehensive documentation for the OpenAI DALL-E compatible image generation API with inline examples and model profiles. Signed-off-by: dougbtv <[email protected]>
Add 29 comprehensive tests covering generation endpoints, model profiles, request validation, and error handling. Signed-off-by: dougbtv <[email protected]>
Implement /v1/images/generations endpoint with: - AsyncOmniDiffusion integration for text-to-image generation - Model profile system for per-model defaults and constraints - Request/response protocol matching OpenAI DALL-E API - Support for Qwen-Image and Z-Image-Turbo models Signed-off-by: dougbtv <[email protected]>
5ca1cb5 to
4f4caed
Compare
quick overview.
This introduces a
/v1/images/generationsOpenAI API endpoint, intended to follow the DALL-E compatible endpoint. This enables serving diffusion models through an OpenAI compatible API.This is in addition to generating diffusion outputs using the completions API, and following the methodology defined and merged in the diffusion online serving PR #259
cc: @fake0fan (thanks for getting the work off to a great start in 259!)
Example client implementation @ https://github.com/dougbtv/comfyui-vllm-omni/
review tips.
When reviewing, I recommend going by commit, and see the changes broken into:
[docs][testing][feature]so you can isolate just the changes / tests / docs during your review.
design thoughts.
The idea here is to build on the async API endpoint work that fake0fan did with using the openai completions endpoint, but, to add a diffusion endpoint.
The thought is to add the endpoint, but also a mapping for adding new model support for the endpoint, so that it can be tuned.
The API endpoints are more easily validated, using Pydantic, than the inlined parameters in the completions string. While I believe that is a reasonable action to expect image generation from a completions endpoint while serving multi-modal models, I think it would be nice to have an API endpoint where the parameters can be validated.
...and I want to use it!
overview.
[Feature] Add OpenAI DALL-E compatible image generation API
Builds on @fake0fan's diffusion online serving implementation to provide
a production-ready, OpenAI-compatible image generation API. Implements
the DALL-E /v1/images/generations endpoint with full async support and
proper error handling.
This implementation focuses on generation-only (not editing) to keep
the initial PR manageable while maintaining full functionality and
extensibility.
OpenAI DALL-E API Compatibility:
Unified Async Server:
vllm serve <model> --omnicommand for all diffusion modelsModel Support (via Model Profiles):
Features:
Implementation Files:
Modified:
Built on @fake0fan's excellent diffusion online serving work. This PR
adds the DALL-E compatible API layer with full validation, error
handling, and production-ready features while keeping the scope focused
on generation to facilitate review.