openai responses instrumentation #6583

jordan-wong · 2025-10-02T13:30:44Z

What does this PR do?

Adds instrumentation for OpenAI's Responses API. Includes APM span generation and LLM-Obs span generation

Motivation

ML-Obs product priority

Plugin Checklist

Additional Notes

github-actions · 2025-10-02T13:31:17Z

Overall package size

Self size: 12.71 MB
Deduped: 115.32 MB
No deduping: 117.53 MB

Dependency sizes

| name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.2.1 | 20.64 MB | 20.65 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.11.1 | 9.96 MB | 10.34 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.73 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @opentelemetry/resources | 1.9.1 | 306.54 kB | 1.74 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api-logs | 0.205.0 | 201.51 kB | 1.42 MB | | @opentelemetry/api | 1.9.0 | 1.22 MB | 1.22 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.14.4 | 123.18 kB | 851.76 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | @datadog/openfeature-node-server | 0.1.0-preview.10 | 95.11 kB | 401.46 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |

_{🤖 This report was automatically generated by heaviest-objects-in-the-universe}

codecov · 2025-10-02T13:31:44Z

Codecov Report

❌ Patch coverage is 20.38835% with 82 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.80%. Comparing base (f82ae45) to head (65ae2e0).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
packages/dd-trace/src/llmobs/plugins/openai.js	13.04%	80 Missing ⚠️
packages/dd-trace/src/llmobs/tagger.js	77.77%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6583      +/-   ##
==========================================
+ Coverage   81.97%   83.80%   +1.82%     
==========================================
  Files         479      503      +24     
  Lines       20424    21094     +670     
==========================================
+ Hits        16742    17677     +935     
+ Misses       3682     3417     -265

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pr-commenter · 2025-10-02T13:38:35Z

Benchmarks

Benchmark execution time: 2025-10-15 21:45:08

Comparing candidate commit 65ae2e0 in PR branch openai-responses-instrumentation with baseline commit f82ae45 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1596 metrics, 74 unstable metrics.

sabrenner · 2025-10-08T18:40:15Z

packages/datadog-instrumentations/src/openai.js

+  {
+    file: 'resources/responses',
+    targetClass: 'Responses',
+    baseResource: 'responses',
+    methods: ['create'],
+    streamedResponse: false
+  },


Suggested change

{

file: 'resources/responses',

targetClass: 'Responses',

baseResource: 'responses',

methods: ['create'],

streamedResponse: false

},

sabrenner · 2025-10-08T18:40:23Z

packages/datadog-instrumentations/src/openai.js

+    baseResource: 'responses',
+    methods: ['create'],
+    streamedResponse: false,
+    versions: ['>=4.85.0']


Suggested change

versions: ['>=4.85.0']

versions: ['>=4.87.0']

sabrenner · 2025-10-08T18:42:17Z

packages/datadog-instrumentations/src/openai.js

+    targetClass: 'Responses',
+    baseResource: 'responses',
+    methods: ['create'],
+    streamedResponse: false,


Suggested change

streamedResponse: false,

streamedResponse: true,

sabrenner · 2025-10-08T18:47:08Z

packages/datadog-instrumentations/src/openai.js

+function wrapCreate (create) {
+  return function (request) {
+    if (!vertexaiTracingChannel.start.hasSubscribers) {
+      // calls the original function
+      return create.apply(this, arguments)
+    }
+
+    const ctx = {
+      request,
+      instance: this,
+      resource: [this.constructor.name, create.name].join('.')
+    }
+    // am I using the right channel? tracingChannel vs diagnostics channel
+    return ch.tracePromise(create, ctx, this, ...arguments)
+  }
+}
+


Suggested change

function wrapCreate (create) {

return function (request) {

if (!vertexaiTracingChannel.start.hasSubscribers) {

// calls the original function

return create.apply(this, arguments)

}

const ctx = {

request,

instance: this,

resource: [this.constructor.name, create.name].join('.')

}

// am I using the right channel? tracingChannel vs diagnostics channel

return ch.tracePromise(create, ctx, this, ...arguments)

}

}

sabrenner · 2025-10-08T18:47:20Z

packages/datadog-instrumentations/src/openai.js

+//register patching hooks via addHook
+addHook({ name: 'openai', file: 'resources/responses.js', versions: ['>=4.87.0'] }, exports => {
+  const Responses = exports.OpenAIApi.responses
+  // wrap functions on module exports with shimmer.wrap
+  shimmer.wrap(responses.prototype, 'responses.createResponse', wrapCreate)
+  return exports
+})
+
+
+


Suggested change

//register patching hooks via addHook

addHook({ name: 'openai', file: 'resources/responses.js', versions: ['>=4.87.0'] }, exports => {

const Responses = exports.OpenAIApi.responses

// wrap functions on module exports with shimmer.wrap

shimmer.wrap(responses.prototype, 'responses.createResponse', wrapCreate)

return exports

})

packages/datadog-plugin-openai/src/tracing.js

sabrenner

really cool stuff, and i did see all the tests are passing!

i also realized that the tests are now a little out of date with what our python integration does, as it's been updated since writing the tests and the tests did not get those updates 😭 that's on me, so i'll update those tests and we might be able to simplify some logic here.

left a couple of other comments for things we might be able to clean up in the meantime, but i'll get those tests fixed up first 🫡

sabrenner · 2025-10-14T15:18:14Z

packages/datadog-plugin-openai/src/stream-helpers.js

+    return finalResponse
+  }
+
+  // If no final response found, fall back to accumulating from deltas and items


is there a specific model/response format that doesn't include the aggregated streamed response in the last chunk? i think otherwise we can safely only look at the last chunk and not worry about doing manual aggregating

sabrenner · 2025-10-14T15:20:42Z

packages/datadog-plugin-openai/src/tracing.js

+  // Extract input information
+  if (payload.input) {
+    openaiStore.input = payload.input
+    tags['openai.request.input_length'] = payload.input.length
+  }
+
+  // Extract reasoning configuration
+  if (payload.reasoning) {
+    if (payload.reasoning.effort) {
+      tags['openai.request.reasoning.effort'] = payload.reasoning.effort
+    }
+    openaiStore.reasoning = payload.reasoning
+  }
+
+  // Extract background flag
+  if (payload.background !== undefined) {
+    tags['openai.request.background'] = payload.background
+  }


i think we can remove these blocks and just do model tagging for apm spans

packages/dd-trace/src/llmobs/plugins/openai.js

sabrenner · 2025-10-14T15:28:21Z

packages/dd-trace/src/llmobs/plugins/openai.js

@@ -191,6 +195,188 @@ class OpenAiLLMObsPlugin extends LLMObsPlugin {

    this._tagger.tagMetadata(span, metadata)
  }
+
+  _tagResponse (span, inputs, response, error) {


Suggested change

_tagResponse (span, inputs, response, error) {

#tagResponse (span, inputs, response, error) {

just as a general JS practice this is better for private functions (i know the other ones aren't like this, i need to get around to cleaning up the rest of this code 😅)

and then calling the function above would just be

this.#tagResponse(...)

sabrenner · 2025-10-14T16:32:25Z

packages/dd-trace/src/llmobs/plugins/openai.js

+              toolId: item.call_id,
+              name: item.name,
+              arguments: args,
+              type: 'function'


Suggested change

type: 'function'

type: item.type

sabrenner · 2025-10-14T16:39:00Z

packages/datadog-plugin-openai/src/stream-helpers.js

+  if (finalResponse) {
+    // For simple text responses, if output is empty or an empty array, accumulate from deltas
+    const outputIsEmpty = !finalResponse.output || 
+                          finalResponse.output === '' || 
+                          (Array.isArray(finalResponse.output) && finalResponse.output.length === 0)
+
+    if (outputIsEmpty) {
+      const outputText = chunks
+        .filter(chunk => chunk.type === 'response.output_text.delta')
+        .map(chunk => chunk.delta)
+        .join('')
+
+      if (outputText) {
+        return {
+          ...finalResponse,
+          output: outputText
+        }
+      }
+    }
+    return finalResponse
+  }


Suggested change

if (finalResponse) {

// For simple text responses, if output is empty or an empty array, accumulate from deltas

const outputIsEmpty = !finalResponse.output ||

finalResponse.output === '' ||

(Array.isArray(finalResponse.output) && finalResponse.output.length === 0)

if (outputIsEmpty) {

const outputText = chunks

.filter(chunk => chunk.type === 'response.output_text.delta')

.map(chunk => chunk.delta)

.join('')

if (outputText) {

return {

...finalResponse,

output: outputText

}

}

}

return finalResponse

}

return finalResponse

i think this might be good here - were there any scenarios you found where the last chunk did not have aggregations done properly?

sabrenner · 2025-10-14T16:39:41Z

packages/datadog-plugin-openai/src/stream-helpers.js

+  // - response.done/response.incomplete/response.completed: final response with output array and usage
+
+  // Find the last chunk with a complete response object (status: done, incomplete, or completed)
+  let finalResponse = null


Suggested change

let finalResponse = null

let finalResponse

packages/dd-trace/src/llmobs/span_processor.js

sabrenner · 2025-10-14T18:15:10Z

packages/dd-trace/src/llmobs/tagger.js

+      // Only include content if it's not empty OR if there are no tool calls
+      // (For responses API, tool call messages should not have content field)
+      if (content !== '' || !messageObj.tool_calls) {
+        messageObj.content = content
+      }
+
+      // For role, always include it (even if empty string) when there are tool calls
+      // Otherwise use conditional tagging which skips empty values
+      let condition
+      if (messageObj.tool_calls && messageObj.tool_calls.length > 0) {
+        // For tool call messages, always include role even if empty
+        messageObj.role = role || ''
+        condition = true
+      } else {
+        condition = this.#tagConditionalString(role, 'Message role', messageObj, 'role')
+      }
+


i've realized the shared tests are now a bit out of date with what we do in python - i will fix those and get back to you, i thinkkkkk this stuff here could be removed once i fix up the tests 🙏 i will let you know, but i did confirm that these changes work with the current tests, so thank you!

sabrenner · 2025-10-14T18:19:33Z

packages/dd-trace/src/llmobs/plugins/openai.js

+    // Tag metadata
+    const metadata = Object.entries(parameters).reduce((obj, [key, value]) => {
+      if (!['tools', 'functions', 'instructions'].includes(key)) {
+        obj[key] = value
+      }
+      return obj
+    }, {})
+
+    // Add fields from response
+    if (response.temperature !== undefined) metadata.temperature = Number(response.temperature)
+    if (response.top_p !== undefined) metadata.top_p = Number(response.top_p)
+    if (response.tools !== undefined) {
+      metadata.tools = Array.isArray(response.tools) ? [...response.tools] : response.tools
+    }
+    if (response.tool_choice !== undefined) metadata.tool_choice = response.tool_choice
+    if (response.truncation !== undefined) metadata.truncation = response.truncation
+    if (response.text !== undefined) metadata.text = response.text
+    if (response.usage?.output_tokens_details?.reasoning_tokens !== undefined) {
+      metadata.reasoning_tokens = response.usage.output_tokens_details.reasoning_tokens
+    }
+
+    // Add reasoning metadata from input parameters
+    if (reasoning) {
+      metadata.reasoning = reasoning
+    }
+    if (background !== undefined) {
+      metadata.background = background
+    }


i think we can keep metadata tagging to the request parameters and not from the response, and then maybe do a allowlist instead of denylist for what we want to include, ie

const metadata = Object.entries(parameters).reduce((obj, [key, value]) => { if (['temperature', 'top_p', 'tuncation', ...].includes(key)) { obj[key] = value } return obj }, {})

lmk if that doesn't make sense!

sabrenner · 2025-10-16T15:17:34Z

packages/datadog-plugin-openai/src/stream-helpers.js

+  if (chunks.length === 0) return {}
+
+  // The responses API streams events with different types:
+  // - response.output_text.delta: incremental text deltas
+  // - response.output_text.done: complete text for a content part
+  // - response.output_item.done: complete output item with role
+  // - response.done/response.incomplete/response.completed: final response with output array and usage
+
+  // Find the last chunk with a complete response object (status: done, incomplete, or completed)
+  let finalResponse
+  for (let i = chunks.length - 1; i >= 0; i--) {
+    const chunk = chunks[i]
+    if (chunk.response && ['done', 'incomplete', 'completed'].includes(chunk.response.status)) {
+      finalResponse = chunk.response
+      return finalResponse
+    }
+  }
+
+  return finalResponse


Suggested change

if (chunks.length === 0) return {}

// The responses API streams events with different types:

// - response.output_text.delta: incremental text deltas

// - response.output_text.done: complete text for a content part

// - response.output_item.done: complete output item with role

// - response.done/response.incomplete/response.completed: final response with output array and usage

// Find the last chunk with a complete response object (status: done, incomplete, or completed)

let finalResponse

for (let i = chunks.length - 1; i >= 0; i--) {

const chunk = chunks[i]

if (chunk.response && ['done', 'incomplete', 'completed'].includes(chunk.response.status)) {

finalResponse = chunk.response

return finalResponse

}

}

return finalResponse

// The responses API streams events with different types:

// - response.output_text.delta: incremental text deltas

// - response.output_text.done: complete text for a content part

// - response.output_item.done: complete output item with role

// - response.done/response.incomplete/response.completed: final response with output array and usage

// Find the last chunk with a complete response object (status: done, incomplete, or completed)

for (let i = chunks.length - 1; i >= 0; i--) {

const chunk = chunks[i]

if (chunk.response && ['done', 'incomplete', 'completed'].includes(chunk.response.status)) {

return chunk.response

}

}

now that we simplified the logic a bit i think we can just return the proper chunk directly, and if there isn't one the function will return undefined by default

we could also move the ['done', 'incomplete', 'completed'] array to be a const at the top of the file

sabrenner · 2025-10-16T15:29:05Z

packages/dd-trace/src/llmobs/plugins/openai.js

+      } else if (tokenUsage.prompt_tokens_details) {
+        // Chat/Completions API - only include if > 0
+        const cacheReadTokens = tokenUsage.prompt_tokens_details.cached_tokens
+        if (cacheReadTokens !== undefined && cacheReadTokens > 0) {


Suggested change

if (cacheReadTokens !== undefined && cacheReadTokens > 0) {

if (cacheReadTokens) {

i think this should be the same, since if cacheReadTokens is 0 then it'll still evaluate as false-y.

sabrenner · 2025-10-16T15:29:43Z

packages/dd-trace/src/llmobs/plugins/openai.js

+          if (typeof parsedArgs === 'string') {
+            try {
+              parsedArgs = JSON.parse(parsedArgs)
+            } catch (e) {


Suggested change

} catch (e) {

} catch {

sabrenner · 2025-10-16T15:31:01Z

packages/dd-trace/src/llmobs/plugins/openai.js

+        if (item.type === 'reasoning') {
+          // Extract reasoning text from summary
+          let reasoningText = ''
+          if (item.summary && Array.isArray(item.summary) && item.summary.length > 0) {


Suggested change

if (item.summary && Array.isArray(item.summary) && item.summary.length > 0) {

if (Array.isArray(item.summary) && item.summary.length > 0) {

Array.isArray will also check if non-null/non-undefined!

sabrenner · 2025-10-16T15:31:28Z

packages/dd-trace/src/llmobs/plugins/openai.js

+          if (typeof args === 'string') {
+            try {
+              args = JSON.parse(args)
+            } catch (e) {


Suggested change

} catch (e) {

} catch {

sabrenner · 2025-10-16T15:45:11Z

packages/dd-trace/src/llmobs/plugins/openai.js

+          }
+
+          // Extract tool calls if present in message.tool_calls
+          if (item.tool_calls && Array.isArray(item.tool_calls)) {


Suggested change

if (item.tool_calls && Array.isArray(item.tool_calls)) {

if (Array.isArray(item.tool_calls)) {

sabrenner · 2025-10-16T15:46:03Z

packages/dd-trace/src/llmobs/plugins/openai.js

+    this._tagger.tagLLMIO(span, inputMessages, outputMessages)
+
+    // Tag metadata - use allowlist approach for request parameters
+    const allowedParamKeys = [


we can make this a const at the top of the file

packages/dd-trace/src/llmobs/plugins/openai.js

sabrenner · 2025-10-16T17:40:36Z

packages/dd-trace/src/llmobs/tagger.js

      }

+      // Include content if not empty, no tool calls/results, or explicitly provided
+      if (content !== '' || (!messageObj.tool_calls && !messageObj.tool_results) || ('content' in message)) {


i think we should be able to get rid of this changed logic here (although I know it'll fail the shared tests 😭). we actually get around this for the newly added anthropic integration by setting content to an empty string:

dd-trace-js/packages/dd-trace/src/llmobs/plugins/anthropic.js

Line 178 in c2d2d84

inputMessages.push({ content: '', role, toolResults: [toolResult] })

but i know the shared tests expect no content at all. let's revert this section, and let me see if i can modify the shared tests to account for either empty content/content not included, and we can undo this change, since this touches a path that users could run into as well.

sabrenner · 2025-10-16T17:48:38Z

packages/dd-trace/src/llmobs/tagger.js

      const condition1 = this.#tagConditionalString(result, 'Tool result', toolResultObj, 'result')
      const condition2 = this.#tagConditionalString(toolId, 'Tool ID', toolResultObj, 'tool_id')
+      // name can be empty string, so always include it
+      toolResultObj.name = name


Suggested change

toolResultObj.name = name

if (typeof name === 'string') {

toolResultObj.name = name

} else {

this.#handleFailure(`[LLMObs] Expected tool result name to be a string, instead got "${typeof name}"`)

}

since we can't use tagConfitionalString here (which i should change in the future to only check on null/undefined an not falsy), we should add a small guard that the name is a string (which is what our backend expects)

openai responses instrumentation

db5778f

sabrenner reviewed Oct 8, 2025

View reviewed changes

jordan-wong added 2 commits October 10, 2025 00:25

clean up code

1c513a9

add streaming processing, fix span tag adding

0579f5e

sabrenner reviewed Oct 14, 2025

View reviewed changes

jordan-wong added 4 commits October 15, 2025 13:49

make changes to conform with updated tool call tagging behavior tests

933080d

merge in master

30c7880

fix tags after merge master

6540af2

remove unecessary stream chunk processing, refactor metadata tags

65ae2e0

sabrenner reviewed Oct 16, 2025

View reviewed changes

	_tagResponse (span, inputs, response, error) {
	#tagResponse (span, inputs, response, error) {

	if (cacheReadTokens !== undefined && cacheReadTokens > 0) {
	if (cacheReadTokens) {

	if (item.summary && Array.isArray(item.summary) && item.summary.length > 0) {
	if (Array.isArray(item.summary) && item.summary.length > 0) {

	if (item.tool_calls && Array.isArray(item.tool_calls)) {
	if (Array.isArray(item.tool_calls)) {

openai responses instrumentation #6583

Are you sure you want to change the base?

openai responses instrumentation #6583

Uh oh!

Conversation

jordan-wong commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Plugin Checklist

Additional Notes

Uh oh!

github-actions bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overall package size

Uh oh!

codecov bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pr-commenter bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sabrenner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

jordan-wong commented Oct 2, 2025 •

edited

Loading

github-actions bot commented Oct 2, 2025 •

edited

Loading

codecov bot commented Oct 2, 2025 •

edited

Loading

pr-commenter bot commented Oct 2, 2025 •

edited

Loading