Feature: Instrument OpenAI's new Responses API #2782 #2787

tongyu0924 · 2025-03-18T07:16:41Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Important

Add responses_retrieve_wrapper for OpenAI response retrieval and enhance tracing in assistant_wrappers.py.

Behavior:
- Add responses_retrieve_wrapper to handle Responses.retrieve in __init__.py and assistant_wrappers.py.
- Update messages_list_wrapper to include LLM_SYSTEM and LLM_USAGE_PROMPT_TOKENS attributes.
- Modify runs_create_wrapper to store run_id in runs dictionary.
Instrumentation:
- Add span attributes for response ID and content in responses_retrieve_wrapper.
- Update messages_list_wrapper to handle message roles and content more precisely.
Misc:
- Rename index variables in messages_list_wrapper for clarity.

^{This description was created by}^{for c8858fa. It will automatically update as commits are pushed.}

CLAassistant · 2025-03-18T07:16:47Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

ellipsis-dev

👍 Looks good to me! Reviewed everything up to c8858fa in 2 minutes and 47 seconds

More details

Looked at 206 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 14 drafted comments based on config settings.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py:247

Draft comment:
New instrumentation for Responses.retrieve has been added. Ensure this new span is consistent with other spans (naming, attributes) and doesn’t conflict with existing ones.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50%
None

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:27

Draft comment:
Avoid using 'assert' to enforce response type in production; consider handling unexpected types gracefully (e.g. with error logging) because asserts may be disabled.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The @dont_throw decorator implies this code is already designed to handle errors gracefully. The assert here appears to be used for type checking during development. The suggestion might make the code more defensive, but could also hide bugs that should be caught early. The assert helps catch integration issues with the OpenAI API.
The comment makes a valid point about asserts being disabled in production. However, it may be missing the context of the @dont_throw decorator and the value of failing fast during development.
Given the @dont_throw decorator and that this is an API integration, the assert serves a useful purpose in development while production errors would still be caught by the decorator.
The comment should be deleted. The current error handling approach with @dont_throw and assert for type checking is a reasonable pattern for this API integration code.

3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:124

Draft comment:
Avoid using 'id' as a variable name as it shadows the built-in function; rename to something like 'thread_id'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:185

Draft comment:
Accessing content[0] without a length check may cause an IndexError if the content list is empty; consider adding a defensive check.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
This is tricky because we don't have the OpenAI API documentation to confirm the structure of message content. However, since this is production instrumentation code for OpenAI's official API, and the code assumes a specific structure (content[0].text.value), it's likely that this structure is guaranteed by the API. The code is also using .get() for other accesses, showing awareness of null safety, so the direct array access was likely intentional.
I could be wrong about the API guarantees - perhaps there are edge cases where a message could have empty content. The code would be more defensive with the check.
However, if empty content is actually possible, it would likely indicate a bug in the OpenAI API itself, as messages are supposed to have content. Adding a null check would hide an API violation that should be caught.
The direct array access appears intentional and matches expected API guarantees. Adding null checks would mask potential API bugs rather than improve safety.

5. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:98

Draft comment:
Consider using isinstance() for type-checking (for LegacyAPIResponse) to account for possible subclass instances, rather than a direct type comparison.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:229

Draft comment:
Variable 'i' is used for prompt indexing here while messages_list_wrapper uses 'prompt_index'; for consistency and clarity, consider renaming it.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

7. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py:247

Draft comment:
New wrapper for Responses.retrieve added. Ensure the 'openai.resources.responses' module exists across supported versions and add tests for it.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The try/except block already handles the case where the module doesn't exist. The comment is essentially asking for verification and testing, which violates our rules about not asking for verification or making speculative comments. The code structure already handles potential module absence gracefully.
Maybe there's value in explicitly testing this new functionality? The module existence check could be important for compatibility.
The code already handles missing modules gracefully, and asking for tests is a process request rather than a code quality issue. We should trust the author to follow testing practices.
Delete the comment as it primarily asks for verification and testing, which violates our rules about not making speculative comments or asking for process confirmations.

8. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:32

Draft comment:
Avoid using assert for type-checking in production code. Consider using an explicit isinstance check with graceful error handling or logging to prevent issues when assertions are disabled.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The code is in a function decorated with @dont_throw, which likely means exceptions are already being handled at a higher level. The assert is being used as a development-time check rather than a runtime validation. The suggestion would change the behavior by forcing an exception even with assertions disabled, which may not be the intended behavior. The current assert usage seems intentional for development-time validation.
The comment raises a valid point about assert statements being disabled in optimized Python. However, I may be missing security implications of relying on assert for type checking.
Given the @dont_throw decorator and the apparent intention for development-time validation only, the current assert usage appears to be an acceptable pattern here. The suggested change would fundamentally alter the runtime behavior.
The comment should be deleted as the current assert usage appears intentional for development-time validation, and changing it would alter the intended behavior of the code.

9. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:104

Draft comment:
Ensure that 'parsed_response.thread_id' is always present and consistent with the key used in runs_create_wrapper. Consider adding explicit checks or handling for missing thread_id.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment asks to "ensure thread_id is present" but the code already handles this safely by checking if thread_id exists in runs before using it. The Run type from OpenAI's SDK likely guarantees thread_id is present. The code is defensive by checking the dictionary access. This seems like an unnecessary speculative concern.
I could be wrong about the Run type guarantees. There could be edge cases where thread_id is None or empty string.
Even if thread_id had edge cases, the existing dictionary check safely handles them by skipping the update if the key isn't found. No data corruption or crashes would occur.
Delete this comment. The code already safely handles thread_id access through dictionary checking, and the comment is speculative about potential issues without clear evidence of a problem.

10. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:124

Draft comment:
Avoid using 'id' as a variable name since it shadows the built-in. Rename it (e.g., to 'thread_id') for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

11. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:142

Draft comment:
The use of separate counters 'prompt_index' and 'completion_index' in messages_list_wrapper improves clarity. Verify that treating messages with roles 'user' or 'system' as prompts (and others as completions) is the intended behavior.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
This comment is asking the PR author to verify their intention, which violates the rule against asking for confirmation of intended behavior. It does not provide a specific code suggestion or point out a specific issue that needs addressing.

12. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:229

Draft comment:
In runs_create_and_stream_wrapper, consider using a more descriptive variable name (e.g., 'prompt_index' instead of 'i') for consistency with other wrappers.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

13. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:88

Draft comment:
In runs_create_wrapper, 'run_id' from response_dict is stored. Ensure this attribute is used consistently in downstream instrumentation or logging for effective traceability.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
The comment is asking the author to ensure consistent usage of 'run_id' in downstream processes. This falls under the category of asking the author to ensure something is done, which violates the rules.

14. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py:123

Draft comment:
Typographical error: 'he size of returned vector' should be corrected to 'the size of returned vector'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_M6Zh7uFkU8LmBBUX

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

nirga

Thanks @tongyu0924, can you add tests for this?

tongyu0924 added 2 commits March 18, 2025 15:11

Update assistant_wrappers.py

ef0d1a3

Update __init__.py

c8858fa

tongyu0924 changed the title ~~#2782~~ Feature: Instrument OpenAI's new Responses API #2782 Mar 18, 2025

ellipsis-dev bot reviewed Mar 18, 2025

View reviewed changes

nirga reviewed Mar 18, 2025

View reviewed changes

Update assistant_wrappers.py

d730cbe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Instrument OpenAI's new Responses API #2782 #2787

Feature: Instrument OpenAI's new Responses API #2782 #2787

tongyu0924 commented Mar 18, 2025 •

edited by ellipsis-dev bot

Loading

CLAassistant commented Mar 18, 2025

ellipsis-dev bot left a comment

nirga left a comment

Feature: Instrument OpenAI's new Responses API #2782 #2787

Are you sure you want to change the base?

Feature: Instrument OpenAI's new Responses API #2782 #2787

Conversation

tongyu0924 commented Mar 18, 2025 • edited by ellipsis-dev bot Loading

CLAassistant commented Mar 18, 2025

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

nirga left a comment

Choose a reason for hiding this comment

tongyu0924 commented Mar 18, 2025 •

edited by ellipsis-dev bot

Loading