Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider structure of context.db.statement #2019

Closed
gingerwizard opened this issue Mar 29, 2021 · 6 comments · Fixed by #2873
Closed

Reconsider structure of context.db.statement #2019

gingerwizard opened this issue Mar 29, 2021 · 6 comments · Fixed by #2873
Assignees
Labels
agent-nodejs Make available for APM Agents project planning. discuss enhancement

Comments

@gingerwizard
Copy link

We utilise the Node JS agent to capture Elasticsearch queries from Kibana. For this we need the full content body which is currently captured in the field context.db.statement. This value, however, contains the request body and URL (with parameters) delimited by two new lines. In order for us to extract the ES body + index name + any parameters, we currently parse this payload client side.

This obviously works it makes our code susceptible to changes in the agent. Whilst fields are governed by ECS and a change control process, are changes in the structure of field values also considered potentially breaking changes? If so, could such changes be made in a minor or could we rely on their consistency across majors?
https://www.elastic.co/guide/en/apm/agent/nodejs/current/release-notes-3.x.html#release-notes-3.10.0 suggests that changes can be made in a minor which is concerning.

This is further complicated by the fact we use the agent version distributed with Kibana, so we would need to track the agent version this uses.

One proposal is that these values be stored in separate fields i.e. only the request body be stored in context.db.statement and the params and index names be stored in separate fields. The agent would in effect be responsible for parsing these values. Whilst we would prefer this solution, we appreciate that is a decision beyond a single agent.

@trentm

@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Mar 29, 2021
@trentm
Copy link
Member

trentm commented Mar 29, 2021

Some background for others. Dale and I discussed this a little bit before this ticket.

Currently the structure of context.db.statement for Elasticsearch spans is defined by this code:

// Set the span's `context.db` from the Elasticsearch request querystring and
// body, if the request path looks like it is a query API.
//
// Some ES endpoints, e.g. '_search', support both query params and a body.
// We encode both into 'span.context.db.statement', separated by '\n\n'
// if both are present. E.g. for a possible msearch:
//
// search_type=query_then_fetch&typed_keys=false
//
// {}
// {"query":{"query_string":{"query":"pants"}}}
exports.setElasticsearchDbContext = function (span, path, query, body) {
if (pathIsAQuery.test(path)) {
// From @elastic/elasticsearch: A read of Transport.js suggests query and
// body will always be serialized strings, however the documented
// `TransportRequestParams` allows for non-strings, so we will be defensive.
//
// From legacy elasticsearch: query will be an object and body will be an
// object, or an array of objects, e.g. for bulk endpoints.
const parts = []
if (query) {
if (typeof (query) === 'string') {
parts.push(query)
} else if (typeof (query) === 'object') {
const encodedQuery = querystring.encode(query)
if (encodedQuery) {
parts.push(encodedQuery)
}
}
}
if (body) {
if (typeof (body) === 'string') {
parts.push(body)
} else if (Array.isArray(body)) {
parts.push(body.map(JSON.stringify).join('\n')) // ndjson
} else if (typeof (body) === 'object') {
parts.push(JSON.stringify(body))
}
}

That structure is mention in the changelog (Dale linked to it above), but not elsewhere in docs.

Here is the section of the current apm-server spec that governs what fields are allowed in context.db: https://github.com/elastic/apm-server/blob/141ee87f27b41f1dc6b03f278fb1afa88ead747e/docs/spec/v2/span.json#L31-L83

That Elasticsearch allows material fields for the query to be in either the query params and/or the body is what led to changing the form in v3.10.0 of the agent. Earlier versions of the Node.js APM agent would only include the query params if the body was empty. The content/structure of context.db.statement is not in the specs. My understanding of its intent is only that it shows relevant query data when viewed in the APM UI in Kibana. The structure being using in the Node.js APM agent now matches what the Python APM agent uses. For contrast, the Ruby APM agent uses a serialized JSON object with "params" (the query params) and "body" fields.

One proposal is that these values be stored in separate fields i.e. only the request body be stored in context.db.statement and the params and index names be stored in separate fields. ...

I think it is more than it being about multiple agent languages (which is one issue), but also that context.db is part of the APM server spec about capturing context for all DB-y technologies. Does it make sense to have a separate general field there? What would it be called?

current structure

Another question is whether the current structure provides the data you need. For example, I have a play app that makes a search using the ES client like so:

  client.search({
    index: 'kibana_sample_data_logs',
    body: {
      size: 1,
      query: {
        match_all: {}
      }
    }
  }
...

from which the ES client provides this data on the request:

{ params:
   { method: 'POST',
     path: '/kibana_sample_data_logs/_search',
     body: '{"size":1,"query":{"match_all":{}}}',
     querystring: '',
     headers:
      { 'user-agent':
         'elasticsearch-js/7.11.0 (darwin 20.3.0-x64; Node.js v10.23.0)',
        'x-elastic-client-meta': 'es=7.11.0,js=10.23.0,t=7.11.0,hc=10.23.0',
        'content-type': 'application/json',
        'content-length': '35' },
     timeout: 30000 },
  options: { },
  id: 1 }

and for which the current Node.js APM agent includes this data:

"context":{"db":{"type":"elasticsearch","statement":"{\"size\":1,\"query\":{\"match_all\":{}}}"},"destination":{"service":{"name":"elasticsearch","resource":"elasticsearch","type":"db"},"address":"localhost","port":9200}}

Note that the kibana_sample_data_logs index isn't in that data.

compatibility guarantees

As a fallback, theoretically your es-rally processor of this data should have the agent version available as a mechanism for dealing with structure changes:

...
"agent":{"name":"nodejs","version":"3.12.1"}
...

However, obviously that isn't great for your code maint.

I think, at the least, we document in the Node.js APM agent code that the current structure is being relied upon by your es rally code, so the APM developer is aware if considering making changes. I don't have a solid feel yet for if we want to lock it down as a major breaking change. I suspect in this case it wouldn't cause undue burden for APM agent maint to make this structure a promised interface -- perhaps after discussion here on whether we want to change the structure a last time first.

@felixbarny
Copy link
Member

I'd propose to set context.http.* to store the URL and the query parameters and just store the actual query from the request body in db.statement.
I don't think this would count as a backward-incompatible change that requires a new major version. Rally could have a condition on whether the db.statement starts with {.

Would that work for you, @gingerwizard?

@gingerwizard
Copy link
Author

That makes a lot of sense to me if its compatible with the current usage of those fields.

@trentm
Copy link
Member

trentm commented May 3, 2021

Sounds good to me. I can take this.

@trentm trentm self-assigned this May 3, 2021
@felixbarny
Copy link
Member

That general issue is related to this one: elastic/apm#439

@trentm
Copy link
Member

trentm commented Nov 11, 2021

Note elastic/kibana#107751

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning. discuss enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants