-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(responses)!: implement support for OpenAI compatible prompts in Responses API #3965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(responses)!: implement support for OpenAI compatible prompts in Responses API #3965
Conversation
|
Rebasing from main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Please resolve conflicts and make sure mypy is clean on that file. |
Yeah, I am doing it now :) |
ee37cd5 to
32d6890
Compare
|
I accidentally amend commit and grabbed last commit from LLS during rebasing from main. I need to fix it |
|
Block for now. Still solve the issue |
32d6890 to
087c175
Compare
|
I reverted back to rebasing state, hope to do it quickly again |
087c175 to
2f80636
Compare
|
Rebased was successful, CI is green! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First set of comments
src/llama_stack/providers/inline/agents/meta_reference/responses/utils.py
Outdated
Show resolved
Hide resolved
src/llama_stack/providers/inline/agents/meta_reference/responses/utils.py
Outdated
Show resolved
Hide resolved
src/llama_stack/providers/inline/agents/meta_reference/responses/utils.py
Show resolved
Hide resolved
src/llama_stack/providers/inline/agents/meta_reference/responses/utils.py
Outdated
Show resolved
Hide resolved
src/llama_stack/providers/inline/agents/meta_reference/responses/utils.py
Outdated
Show resolved
Hide resolved
src/llama_stack/providers/inline/agents/meta_reference/responses/utils.py
Outdated
Show resolved
Hide resolved
src/llama_stack/providers/inline/agents/meta_reference/responses/utils.py
Outdated
Show resolved
Hide resolved
src/llama_stack/providers/inline/agents/meta_reference/responses/utils.py
Show resolved
Hide resolved
| if media_content_parts: | ||
| self._prepend_media_into_first_user_message(messages, media_content_parts) | ||
|
|
||
| def _prepend_media_into_first_user_message( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the behavior implemented here documented by the openai docs? if so, could you please add links? otherwise could you say how we have intuited it? (ideally we should do an e2e test by pointing to the openai servers and show the responses, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Media (images/files) presumably should go in user messages because OpenAI's API schema restricts system and developer messages to text-only content. This is enforced by the existing types:
OpenAIUserMessageParam.content:OpenAIChatCompletionMessageContent(allows image and file parts)OpenAISystemMessageParam.content:OpenAIChatCompletionTextOnlyMessageContent(text only)
I see two different solutions:
- Prepend media to either first or last user message
- For each founded media part create new
OpenAIUserMessageParam, ingest media there and then insertOpenAIUserMessageParamin general messages list ofOpenAIMessageParam
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we should create 1 system + 1 (optional) user message and prepend them. The user message can have all the pieces the system message could not get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in any case, if you have a script where you can simply show a simple responses call made with various prompts against OpenAI and then show all the outputs in a gist, it would be useful to double check we are doing the correct things here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I decided to get rid of _prepend_media_into_first_user_message and simplify like that:
# Insert system message with resolved text
messages.insert(0, OpenAISystemMessageParam(content=resolved_prompt_text))
# If we have media, create a new user message because allows to ingest images and files
if media_content_parts:
messages.append(OpenAIUserMessageParam(content=media_content_parts))
These are the last lines of code at _prepend_prompt. Will it be valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't push changes yet, still refactor
2f80636 to
0a642c3
Compare
|
I again manually tested API requests from my PR's description via Curl, all the output is correct as expected. |
|
We need one integration test for the |
What does this PR do?
This PR is responsible for providing actual implementation of OpenAI compatible prompts in Responses API. This is the follow up PR with actual implementation after introducing #3942
The need of this functionality was initiated in #3514.
Closes #3321
Test Plan
Manual testing, CI workflow with added unit tests
Comprehensive manual testing with new implementation:
Test Prompts with Images with text on them in Responses API:
I used this image for testing purposes: iphone 17 image
{"object":"file","id":"file-d6d375f238e14f21952cc40246bc8504","bytes":556241,"created_at":1761750049,"expires_at":1793286049,"filename":"iphone.jpeg","purpose":"assistants"}%{"prompt":"You are a product analysis expert. Analyze the following product:\n\nProduct Name: {{product_name}}\nDescription: {{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed analysis including quality assessment, target audience, and pricing recommendations.","version":1,"prompt_id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":["product_name","description","product_photo"],"is_default":false}%{"created_at":1761750427,"error":null,"id":"resp_f897f914-e3b8-4783-8223-3ed0d32fcbc6","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"### Product Analysis: iPhone 17 Pro Max\n\n**Quality Assessment:**\n\n- **Display & Design:**\n - The 6.9-inch display is large, ideal for streaming and productivity.\n - Anti-reflective technology and 120Hz refresh rate enhance viewing experience, providing smoother visuals and reducing glare.\n - Titanium frame suggests a premium build, offering durability and a sleek appearance.\n\n- **Performance:**\n - The Apple A19 Pro chip promises significant performance improvements, likely leading to faster processing and efficient multitasking.\n - 12GB RAM is substantial for a smartphone, ensuring smooth operation for demanding apps and games.\n\n- **Camera System:**\n - The triple 48MP camera setup (wide, ultra-wide, telephoto) is designed for versatile photography needs, capturing high-resolution photos and videos.\n - The 24MP front camera will appeal to selfie enthusiasts and content creators needing quality front-facing shots.\n\n- **Connectivity:**\n - Wi-Fi 7 support indicates future-proof wireless capabilities, providing faster and more reliable internet connectivity.\n\n**Target Audience:**\n\n- **Tech Enthusiasts:** Individuals interested in cutting-edge technology and performance.\n- **Content Creators:** Users who need a robust camera system for photo and video production.\n- **Luxury Consumers:** Those who prefer premium materials and top-of-the-line specs.\n- **Professionals:** Users who require efficient multitasking and productivity features.\n\n**Pricing Recommendations:**\n\n- Given the premium specifications, a higher price point is expected. Consider pricing competitively within the high-end smartphone market while justifying cost through unique features like the titanium frame and advanced connectivity options.\n- Positioning around the $1,200 to $1,500 range would align with expectations for top-tier devices, catering to its target audience while ensuring profitability.\n\nOverall, the iPhone 17 Pro Max showcases a blend of innovative features and premium design, aimed at users seeking high performance and superior aesthetics.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_66f4d844-4d9e-4102-80fc-eb75b34b6dbd","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":{"product_name":{"text":"iPhone 17 Pro Max","type":"input_text"},"product_photo":{"detail":"high","type":"input_image","file_id":"file-d6d375f238e14f21952cc40246bc8504","image_url":null}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":830,"output_tokens":394,"total_tokens":1224,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%Test Prompts with PDF files in Responses API:
I used this PDF file for testing purposes: invoicesample.pdf
{"object":"file","id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","bytes":149568,"created_at":1761750730,"expires_at":1793286730,"filename":"invoicesample.pdf","purpose":"assistants"}%{"prompt":"You are an accounting and financial analysis expert. Analyze the following invoice document:\n\nInvoice Document: {{invoice_doc}}\n\nProvide a comprehensive analysis","version":1,"prompt_id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":["invoice_doc"],"is_default":false}%{"created_at":1761750881,"error":null,"id":"resp_da866913-db06-4702-8000-174daed9dbbb","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"Here's a detailed analysis of the invoice provided:\n\n### Seller Information\n- **Business Name:** The invoice features a logo with \"Sunny Farm\" indicating the business identity.\n- **Address:** 123 Somewhere St, Melbourne VIC 3000\n- **Contact Information:** Phone number (03) 1234 5678\n\n### Buyer Information\n- **Name:** Denny Gunawan\n- **Address:** 221 Queen St, Melbourne VIC 3000\n\n### Transaction Details\n- **Invoice Number:** #20130304\n- **Date of Transaction:** Not explicitly mentioned, likely inferred from the invoice number or needs clarification.\n\n### Items Purchased\n1. **Apple**\n - Price: $5.00/kg\n - Quantity: 1 kg\n - Subtotal: $5.00\n\n2. **Orange**\n - Price: $1.99/kg\n - Quantity: 2 kg\n - Subtotal: $3.98\n\n3. **Watermelon**\n - Price: $1.69/kg\n - Quantity: 3 kg\n - Subtotal: $5.07\n\n4. **Mango**\n - Price: $9.56/kg\n - Quantity: 2 kg\n - Subtotal: $19.12\n\n5. **Peach**\n - Price: $2.99/kg\n - Quantity: 1 kg\n - Subtotal: $2.99\n\n### Financial Summary\n- **Subtotal for Items:** $36.00\n- **GST (Goods and Services Tax):** 10% of $36.00, which amounts to $3.60\n- **Total Amount Due:** $39.60\n\n### Notes\n- The invoice includes a placeholder text: \"Lorem ipsum dolor sit amet...\" which is typically used as filler text. This might indicate a section intended for terms, conditions, or additional notes that haven’t been completed.\n\n### Visual and Design Elements\n- The invoice uses a simple and clear layout, featuring the business logo prominently and stating essential information such as contact and transaction details in a structured manner.\n- There is a \"Thank You\" note at the bottom, which adds a professional and courteous touch.\n\n### Considerations\n- Ensure the date of the transaction is clear if there are any future references needed.\n- Replace filler text with relevant terms and conditions or any special instructions pertaining to the transaction.\n\nThis invoice appears standard, representing a small business transaction with clearly itemized products and applicable taxes.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_39f3b39e-4684-4444-8e4d-e7395f88c9dc","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":{"invoice_doc":{"type":"input_file","file_data":null,"file_id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","file_url":null,"filename":"invoicesample.pdf"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":529,"output_tokens":513,"total_tokens":1042,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%Test simple text Prompt in Responses API:
{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.","version":1,"prompt_id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":["name","company","role","tone"],"is_default":false}%{"created_at":1761751097,"error":null,"id":"resp_1b037b95-d9ae-4ad0-8e76-d953897ecaef","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"The capital of Ireland is Dublin.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_8e7c72b6-2aa2-4da6-8e57-da4e12fa3ce2","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":{"name":{"text":"Alice","type":"input_text"},"company":{"text":"Dummy Company","type":"input_text"},"role":{"text":"Geography expert","type":"input_text"},"tone":{"text":"professional and helpful","type":"input_text"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":47,"output_tokens":7,"total_tokens":54,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%