Skip to content

HTTP header and body extraction #396

@NimrodAvni78

Description

@NimrodAvni78

We wanted to open this issue to discuss the design of technical implementation of header and body extraction for http

Requirements:

  • attach http request and or response headers to spans as attributes
    • needs to be opt-in
    • some glob pattern to specify which headers to add to spans
    • needs to follow otel semantic conventions
      • http.(request|response).header.<key>
    • for some reason semantic convention says the value of the attribute should be a is a list of values instead of a single one, where a header can appear multiple times, don’t think thats a real case?
  • attach http request and or response body to a span
    • customer should limit the size of the payload to extract
    • i’m pretty sure there are no otel semantic conventions for this
      • we can invent our own and try to push them to otel, http.(request|response).body.payload
    • we can also populate the http.(request|response).body.size attributes
    • needs to be opt-in
    • maybe we should concider adding sampling to this, maybe we dont want every span to have the payload since it might add significant cost, or is normal span sampling good enough
    • obfuscation, we must allow users to remove sensitive parts from their request / response payloads, and replace them with some filler (***), we might want to have multiple strategies for obfuscating payloads
      • json obfuscation: we can detect the payload is json relying on the content-type header:
        for example, lets look at this example json

        {"root":{"username": "myusername", "age": 51, "password": "mypassword"}}

        we can have multiple strategies of how to obfuscate jsons

        • json_values: obfuscate all json values

          {"root":{"username": "***", "age": "***", "password": "***"}}
        • json_path: obfuscate all json values matching a specific jsonPath

          • jsonPath: $.root.password
          {"root":{"name": "myusername", "age": 51, "password": "***"}}
        • json_value_pattern: obfuscate all json values matching some glob pattern

          • pattern: my*
        {"root":{"name": "***", "age": 51, "password": "***"}}
        • note, this might be too cpu intensive to be worth implementing
      • Plain Text obfuscation: for plain text payloads we can just remove the entire payload and replace it with *** using the plaintext obfuscation

this is an example of how a client can configure this:

payload_extraction:
  http:
    max_payload_size: 8192 # maximum size of http packtes to send
    truncate: true # if http request size is bugger the max_payload_size, should we truncate it to fit or ignore the payload extraction for this request
    mode: # maybe a better name
      # add validation for repating values, should we warn or error?
      - "request_headers"
      - "response_headers"
      - "request_body"
      - "response_body"
      - "headers"
      - "body"
      - "all"
    headers:
      pattern: '!(*Authorization*)'
      request:
        # do we want this to be override or merged?
        pattern: 'x-my-custom-req-header'
      response:
        # do we want this to be override or merged?
        pattern: 'x-my-custom-res-header'
      # obfuscation only runs on body
    body:
      obfuscation:
        - type: "json_path"
          json_path: ".root.password"
        - type: "json_value_pattern"
          json_value_pattern: "PK-*"
        - type: "json_values"
        - type: "plaintext"

implementation:

  • this can depend on the large_buffers feature that was added, to send the full http payload from kernel space, and do the header and value parsing in userspace

open questions:

  • what is the difference between obfuscation of headers and just remove them them, do we need both?

what else it gives us:

  • besides giving additional data on spans, it lets us infer a lot of http based protocols in the future, stuff like graphql, elastic and so on

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions