-
Notifications
You must be signed in to change notification settings - Fork 348
openai responses instrumentation #6583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Overall package sizeSelf size: 12.71 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.2.1 | 20.64 MB | 20.65 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.11.1 | 9.96 MB | 10.34 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.73 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @opentelemetry/resources | 1.9.1 | 306.54 kB | 1.74 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api-logs | 0.205.0 | 201.51 kB | 1.42 MB | | @opentelemetry/api | 1.9.0 | 1.22 MB | 1.22 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.14.4 | 123.18 kB | 851.76 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | @datadog/openfeature-node-server | 0.1.0-preview.10 | 95.11 kB | 401.46 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6583 +/- ##
==========================================
+ Coverage 81.97% 83.80% +1.82%
==========================================
Files 479 503 +24
Lines 20424 21094 +670
==========================================
+ Hits 16742 17677 +935
+ Misses 3682 3417 -265 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
BenchmarksBenchmark execution time: 2025-10-15 21:45:08 Comparing candidate commit 65ae2e0 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 1596 metrics, 74 unstable metrics. |
{ | ||
file: 'resources/responses', | ||
targetClass: 'Responses', | ||
baseResource: 'responses', | ||
methods: ['create'], | ||
streamedResponse: false | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | |
file: 'resources/responses', | |
targetClass: 'Responses', | |
baseResource: 'responses', | |
methods: ['create'], | |
streamedResponse: false | |
}, |
baseResource: 'responses', | ||
methods: ['create'], | ||
streamedResponse: false, | ||
versions: ['>=4.85.0'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
versions: ['>=4.85.0'] | |
versions: ['>=4.87.0'] |
targetClass: 'Responses', | ||
baseResource: 'responses', | ||
methods: ['create'], | ||
streamedResponse: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
streamedResponse: false, | |
streamedResponse: true, |
function wrapCreate (create) { | ||
return function (request) { | ||
if (!vertexaiTracingChannel.start.hasSubscribers) { | ||
// calls the original function | ||
return create.apply(this, arguments) | ||
} | ||
|
||
const ctx = { | ||
request, | ||
instance: this, | ||
resource: [this.constructor.name, create.name].join('.') | ||
} | ||
// am I using the right channel? tracingChannel vs diagnostics channel | ||
return ch.tracePromise(create, ctx, this, ...arguments) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function wrapCreate (create) { | |
return function (request) { | |
if (!vertexaiTracingChannel.start.hasSubscribers) { | |
// calls the original function | |
return create.apply(this, arguments) | |
} | |
const ctx = { | |
request, | |
instance: this, | |
resource: [this.constructor.name, create.name].join('.') | |
} | |
// am I using the right channel? tracingChannel vs diagnostics channel | |
return ch.tracePromise(create, ctx, this, ...arguments) | |
} | |
} |
//register patching hooks via addHook | ||
addHook({ name: 'openai', file: 'resources/responses.js', versions: ['>=4.87.0'] }, exports => { | ||
const Responses = exports.OpenAIApi.responses | ||
// wrap functions on module exports with shimmer.wrap | ||
shimmer.wrap(responses.prototype, 'responses.createResponse', wrapCreate) | ||
return exports | ||
}) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//register patching hooks via addHook | |
addHook({ name: 'openai', file: 'resources/responses.js', versions: ['>=4.87.0'] }, exports => { | |
const Responses = exports.OpenAIApi.responses | |
// wrap functions on module exports with shimmer.wrap | |
shimmer.wrap(responses.prototype, 'responses.createResponse', wrapCreate) | |
return exports | |
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really cool stuff, and i did see all the tests are passing!
i also realized that the tests are now a little out of date with what our python integration does, as it's been updated since writing the tests and the tests did not get those updates 😭 that's on me, so i'll update those tests and we might be able to simplify some logic here.
left a couple of other comments for things we might be able to clean up in the meantime, but i'll get those tests fixed up first 🫡
return finalResponse | ||
} | ||
|
||
// If no final response found, fall back to accumulating from deltas and items |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a specific model/response format that doesn't include the aggregated streamed response in the last chunk? i think otherwise we can safely only look at the last chunk and not worry about doing manual aggregating
// Extract input information | ||
if (payload.input) { | ||
openaiStore.input = payload.input | ||
tags['openai.request.input_length'] = payload.input.length | ||
} | ||
|
||
// Extract reasoning configuration | ||
if (payload.reasoning) { | ||
if (payload.reasoning.effort) { | ||
tags['openai.request.reasoning.effort'] = payload.reasoning.effort | ||
} | ||
openaiStore.reasoning = payload.reasoning | ||
} | ||
|
||
// Extract background flag | ||
if (payload.background !== undefined) { | ||
tags['openai.request.background'] = payload.background | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can remove these blocks and just do model tagging for apm spans
@@ -191,6 +195,188 @@ class OpenAiLLMObsPlugin extends LLMObsPlugin { | |||
|
|||
this._tagger.tagMetadata(span, metadata) | |||
} | |||
|
|||
_tagResponse (span, inputs, response, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_tagResponse (span, inputs, response, error) { | |
#tagResponse (span, inputs, response, error) { |
just as a general JS practice this is better for private functions (i know the other ones aren't like this, i need to get around to cleaning up the rest of this code 😅)
and then calling the function above would just be
this.#tagResponse(...)
toolId: item.call_id, | ||
name: item.name, | ||
arguments: args, | ||
type: 'function' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type: 'function' | |
type: item.type |
if (finalResponse) { | ||
// For simple text responses, if output is empty or an empty array, accumulate from deltas | ||
const outputIsEmpty = !finalResponse.output || | ||
finalResponse.output === '' || | ||
(Array.isArray(finalResponse.output) && finalResponse.output.length === 0) | ||
|
||
if (outputIsEmpty) { | ||
const outputText = chunks | ||
.filter(chunk => chunk.type === 'response.output_text.delta') | ||
.map(chunk => chunk.delta) | ||
.join('') | ||
|
||
if (outputText) { | ||
return { | ||
...finalResponse, | ||
output: outputText | ||
} | ||
} | ||
} | ||
return finalResponse | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (finalResponse) { | |
// For simple text responses, if output is empty or an empty array, accumulate from deltas | |
const outputIsEmpty = !finalResponse.output || | |
finalResponse.output === '' || | |
(Array.isArray(finalResponse.output) && finalResponse.output.length === 0) | |
if (outputIsEmpty) { | |
const outputText = chunks | |
.filter(chunk => chunk.type === 'response.output_text.delta') | |
.map(chunk => chunk.delta) | |
.join('') | |
if (outputText) { | |
return { | |
...finalResponse, | |
output: outputText | |
} | |
} | |
} | |
return finalResponse | |
} | |
return finalResponse |
i think this might be good here - were there any scenarios you found where the last chunk did not have aggregations done properly?
// - response.done/response.incomplete/response.completed: final response with output array and usage | ||
|
||
// Find the last chunk with a complete response object (status: done, incomplete, or completed) | ||
let finalResponse = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let finalResponse = null | |
let finalResponse |
// Only include content if it's not empty OR if there are no tool calls | ||
// (For responses API, tool call messages should not have content field) | ||
if (content !== '' || !messageObj.tool_calls) { | ||
messageObj.content = content | ||
} | ||
|
||
// For role, always include it (even if empty string) when there are tool calls | ||
// Otherwise use conditional tagging which skips empty values | ||
let condition | ||
if (messageObj.tool_calls && messageObj.tool_calls.length > 0) { | ||
// For tool call messages, always include role even if empty | ||
messageObj.role = role || '' | ||
condition = true | ||
} else { | ||
condition = this.#tagConditionalString(role, 'Message role', messageObj, 'role') | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've realized the shared tests are now a bit out of date with what we do in python - i will fix those and get back to you, i thinkkkkk this stuff here could be removed once i fix up the tests 🙏 i will let you know, but i did confirm that these changes work with the current tests, so thank you!
// Tag metadata | ||
const metadata = Object.entries(parameters).reduce((obj, [key, value]) => { | ||
if (!['tools', 'functions', 'instructions'].includes(key)) { | ||
obj[key] = value | ||
} | ||
return obj | ||
}, {}) | ||
|
||
// Add fields from response | ||
if (response.temperature !== undefined) metadata.temperature = Number(response.temperature) | ||
if (response.top_p !== undefined) metadata.top_p = Number(response.top_p) | ||
if (response.tools !== undefined) { | ||
metadata.tools = Array.isArray(response.tools) ? [...response.tools] : response.tools | ||
} | ||
if (response.tool_choice !== undefined) metadata.tool_choice = response.tool_choice | ||
if (response.truncation !== undefined) metadata.truncation = response.truncation | ||
if (response.text !== undefined) metadata.text = response.text | ||
if (response.usage?.output_tokens_details?.reasoning_tokens !== undefined) { | ||
metadata.reasoning_tokens = response.usage.output_tokens_details.reasoning_tokens | ||
} | ||
|
||
// Add reasoning metadata from input parameters | ||
if (reasoning) { | ||
metadata.reasoning = reasoning | ||
} | ||
if (background !== undefined) { | ||
metadata.background = background | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can keep metadata tagging to the request parameters
and not from the response, and then maybe do a allowlist instead of denylist for what we want to include, ie
const metadata = Object.entries(parameters).reduce((obj, [key, value]) => {
if (['temperature', 'top_p', 'tuncation', ...].includes(key)) {
obj[key] = value
}
return obj
}, {})
lmk if that doesn't make sense!
if (chunks.length === 0) return {} | ||
|
||
// The responses API streams events with different types: | ||
// - response.output_text.delta: incremental text deltas | ||
// - response.output_text.done: complete text for a content part | ||
// - response.output_item.done: complete output item with role | ||
// - response.done/response.incomplete/response.completed: final response with output array and usage | ||
|
||
// Find the last chunk with a complete response object (status: done, incomplete, or completed) | ||
let finalResponse | ||
for (let i = chunks.length - 1; i >= 0; i--) { | ||
const chunk = chunks[i] | ||
if (chunk.response && ['done', 'incomplete', 'completed'].includes(chunk.response.status)) { | ||
finalResponse = chunk.response | ||
return finalResponse | ||
} | ||
} | ||
|
||
return finalResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (chunks.length === 0) return {} | |
// The responses API streams events with different types: | |
// - response.output_text.delta: incremental text deltas | |
// - response.output_text.done: complete text for a content part | |
// - response.output_item.done: complete output item with role | |
// - response.done/response.incomplete/response.completed: final response with output array and usage | |
// Find the last chunk with a complete response object (status: done, incomplete, or completed) | |
let finalResponse | |
for (let i = chunks.length - 1; i >= 0; i--) { | |
const chunk = chunks[i] | |
if (chunk.response && ['done', 'incomplete', 'completed'].includes(chunk.response.status)) { | |
finalResponse = chunk.response | |
return finalResponse | |
} | |
} | |
return finalResponse | |
// The responses API streams events with different types: | |
// - response.output_text.delta: incremental text deltas | |
// - response.output_text.done: complete text for a content part | |
// - response.output_item.done: complete output item with role | |
// - response.done/response.incomplete/response.completed: final response with output array and usage | |
// Find the last chunk with a complete response object (status: done, incomplete, or completed) | |
for (let i = chunks.length - 1; i >= 0; i--) { | |
const chunk = chunks[i] | |
if (chunk.response && ['done', 'incomplete', 'completed'].includes(chunk.response.status)) { | |
return chunk.response | |
} | |
} |
now that we simplified the logic a bit i think we can just return the proper chunk directly, and if there isn't one the function will return undefined
by default
we could also move the ['done', 'incomplete', 'completed']
array to be a const at the top of the file
} else if (tokenUsage.prompt_tokens_details) { | ||
// Chat/Completions API - only include if > 0 | ||
const cacheReadTokens = tokenUsage.prompt_tokens_details.cached_tokens | ||
if (cacheReadTokens !== undefined && cacheReadTokens > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (cacheReadTokens !== undefined && cacheReadTokens > 0) { | |
if (cacheReadTokens) { |
i think this should be the same, since if cacheReadTokens
is 0
then it'll still evaluate as false-y.
if (typeof parsedArgs === 'string') { | ||
try { | ||
parsedArgs = JSON.parse(parsedArgs) | ||
} catch (e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} catch (e) { | |
} catch { |
if (item.type === 'reasoning') { | ||
// Extract reasoning text from summary | ||
let reasoningText = '' | ||
if (item.summary && Array.isArray(item.summary) && item.summary.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (item.summary && Array.isArray(item.summary) && item.summary.length > 0) { | |
if (Array.isArray(item.summary) && item.summary.length > 0) { |
Array.isArray
will also check if non-null/non-undefined!
if (typeof args === 'string') { | ||
try { | ||
args = JSON.parse(args) | ||
} catch (e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} catch (e) { | |
} catch { |
} | ||
|
||
// Extract tool calls if present in message.tool_calls | ||
if (item.tool_calls && Array.isArray(item.tool_calls)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (item.tool_calls && Array.isArray(item.tool_calls)) { | |
if (Array.isArray(item.tool_calls)) { |
this._tagger.tagLLMIO(span, inputMessages, outputMessages) | ||
|
||
// Tag metadata - use allowlist approach for request parameters | ||
const allowedParamKeys = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can make this a const at the top of the file
} | ||
|
||
// Include content if not empty, no tool calls/results, or explicitly provided | ||
if (content !== '' || (!messageObj.tool_calls && !messageObj.tool_results) || ('content' in message)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should be able to get rid of this changed logic here (although I know it'll fail the shared tests 😭). we actually get around this for the newly added anthropic integration by setting content to an empty string:
inputMessages.push({ content: '', role, toolResults: [toolResult] }) |
but i know the shared tests expect no content at all. let's revert this section, and let me see if i can modify the shared tests to account for either empty content/content not included, and we can undo this change, since this touches a path that users could run into as well.
const condition1 = this.#tagConditionalString(result, 'Tool result', toolResultObj, 'result') | ||
const condition2 = this.#tagConditionalString(toolId, 'Tool ID', toolResultObj, 'tool_id') | ||
// name can be empty string, so always include it | ||
toolResultObj.name = name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toolResultObj.name = name | |
if (typeof name === 'string') { | |
toolResultObj.name = name | |
} else { | |
this.#handleFailure(`[LLMObs] Expected tool result name to be a string, instead got "${typeof name}"`) | |
} |
since we can't use tagConfitionalString here (which i should change in the future to only check on null/undefined an not falsy), we should add a small guard that the name is a string (which is what our backend expects)
What does this PR do?
Adds instrumentation for OpenAI's Responses API. Includes APM span generation and LLM-Obs span generation
Motivation
ML-Obs product priority
Plugin Checklist
Additional Notes