Skip to content

Conversation

@seemk
Copy link
Contributor

@seemk seemk commented Nov 11, 2025

The following is a proposed addition to OTel's file based configuration. I've listed out our common configuration fields and how it should be placed into the OTel's configuration file.

Please discuss the wording and add any missing fields I've left out. Plus language maintainers if you have any distro specific configuration values, please add them here.

@seemk seemk requested review from a team as code owners November 11, 2025 14:41
@seemk seemk marked this pull request as draft November 11, 2025 14:41
runtime:
nodejs:
metrics:
instrumentationMetricsEnabled: true # SPLUNK_INSTRUMENTATION_METRICS_ENABLED

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this kind of properties should not go under .instrumentation.nodejs ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that things related to instrumentations can be placed under the .instrumentation.nodejs node,
but the profiler call graphs and SPLUNK_TRACE_RESPONSE_HEADER_ENABLED should be made into a separate node .splunk or .vendors.splunk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that SPLUNK_TRACE_RESPONSE_HEADER_ENABLED is also a kind of instrumentation 😉 At least in Go, we even have a dedicated instrumentation library which enables it 😉

@ysolomchenko
Copy link
Contributor

ysolomchenko commented Nov 14, 2025

Here is all env vars what we have per language

.NET:

Core

SPLUNK_ACCESS_TOKEN
SPLUNK_REALM

Tracing

SPLUNK_TRACE_RESPONSE_HEADER_ENABLED

Profiler

SPLUNK_PROFILER_ENABLED
SPLUNK_PROFILER_MEMORY_ENABLED
SPLUNK_PROFILER_LOGS_ENDPOINT
SPLUNK_PROFILER_CALL_STACK_INTERVAL
SPLUNK_PROFILER_EXPORT_INTERVAL
SPLUNK_PROFILER_EXPORT_TIMEOUT
SPLUNK_PROFILER_MAX_MEMORY_SAMPLES

Snapshot

SPLUNK_SNAPSHOT_PROFILER_ENABLED
SPLUNK_SNAPSHOT_SAMPLING_INTERVAL
SPLUNK_SNAPSHOT_SELECTION_RATE

Java

Core

SPLUNK_ACCESS_TOKEN
SPLUNK_REALM

Tracing

SPLUNK_TRACE_RESPONSE_HEADER_ENABLED

Metrics

SPLUNK_METRICS_FORCE_FULL_COMMANDLINE

Profiler

SPLUNK_PROFILER_ENABLED
SPLUNK_PROFILER_DIRECTORY
SPLUNK_PROFILER_RECORDING_DURATION
SPLUNK_PROFILER_KEEP_FILES
SPLUNK_PROFILER_LOGS_ENDPOINT
SPLUNK_PROFILER_CALL_STACK_INTERVAL
SPLUNK_PROFILER_MEMORY_ENABLED
SPLUNK_PROFILER_MEMORY_EVENT_RATE
SPLUNK_PROFILER_INCLUDE_INTERNAL_STACKS
SPLUNK_PROFILER_TRACING_STACKS_ONLY
SPLUNK_PROFILER_OTLP_PROTOCOL

Snapshot

SPLUNK_SNAPSHOT_PROFILER_ENABLED
SPLUNK_SNAPSHOT_PROFILER_SAMPLING_INTERVAL
SPLUNK_SNAPSHOT_SELECTION_PROBABILITY
SPLUNK_SNAPSHOT_PROFILER_MAX_STACK_DEPTH
SPLUNK_SNAPSHOT_PROFILER_EXPORT_INTERVAL
SPLUNK_SNAPSHOT_PROFILER_STAGING_CAPACITY

Node.js

Core

SPLUNK_ACCESS_TOKEN
SPLUNK_REALM

Tracing

SPLUNK_TRACE_RESPONSE_HEADER_ENABLED
SPLUNK_TRACING_ENABLED

Metrics

SPLUNK_METRICS_ENABLED
SPLUNK_RUNTIME_METRICS_ENABLED
SPLUNK_RUNTIME_METRICS_COLLECTION_INTERVAL
SPLUNK_DEBUG_METRICS_ENABLED

Profiler

SPLUNK_PROFILER_ENABLED
SPLUNK_PROFILER_LOGS_ENDPOINT
SPLUNK_PROFILER_MEMORY_ENABLED
SPLUNK_PROFILER_CALL_STACK_INTERVAL

Snapshot

SPLUNK_SNAPSHOT_PROFILER_ENABLED
SPLUNK_SNAPSHOT_PROFILER_SAMPLING_INTERVAL
SPLUNK_SNAPSHOT_SELECTION_RATE
SPLUNK_CPU_PROFILER_COLLECTION_INTERVAL

Instrumentation

SPLUNK_INSTRUMENTATION_METRICS_ENABLED
SPLUNK_REDIS_INCLUDE_COMMAND_ARGS
SPLUNK_GRAPHQL_RESOLVE_SPANS_ENABLED

Go

Core

SPLUNK_ACCESS_TOKEN
SPLUNK_REALM

Tracing

SPLUNK_TRACE_RESPONSE_HEADER_ENABLED

Python

Core

SPLUNK_ACCESS_TOKEN
SPLUNK_REALM

Tracing

SPLUNK_TRACE_RESPONSE_HEADER_ENABLED

Profiler

SPLUNK_PROFILER_ENABLED
SPLUNK_PROFILER_LOGS_ENDPOINT
SPLUNK_PROFILER_CALL_STACK_INTERVAL
SPLUNK_PROFILER_INCLUDE_INTERNAL_STACKS

IMO

  • We don’t want to configure the following core environment variables via a file:
SPLUNK_ACCESS_TOKEN
SPLUNK_REALM
  • The profiler/snapshots/callgraphs should be placed under the node .splunk or .vendors.splunk.
  • The instrumentations should be under the node .instrumentation/development.
  • For tracing and metrics, I have mixed thoughts about where to put them, since they could belong either under .splunk or .instrumentation.

realm: "eu0"
accessToken: "${SPLUNK_ACCESS_TOKEN}"
tracing:
enabled: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use disabled instead of enabled as this is the pattern used in delcarative config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @Kielek comment is more correct regarding how we should handle enabling/disabling:
#355 (comment)

disabled in the declarative config is used to disable the SDK, but we don’t have for example a way to disable a trace/metric/logger providers, propagators or resource detectors.
So I’m pretty sure it should be if the node is present, it means it’s enabled. To disable it, you should remove the entire node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to be disabled, the configuration is already pretty hostile to users (with the exporter header, endpoint duplication), I'm not really keen on removing a simple field to toggle the whole signal.

If we're going the route of removing disabled / enabled field, then:

  • tracing / metrics / logs would be toggled via the existence of tracer_provider / meter_provider / logger_provider top level nodes
  • profiling would be toggled via the existence of distribution.splunk.profiling node
  • callgraphs would be toggled via the existence of distribution.splunk.callgraphs node

However as a compromise the disabled field under signals could be optional, where an unspecified value means the signal is on, so you'd explicitly have to write disabled: true in order to disable a signal without removing the whole config block (i.e. when you're debugging something).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without removing the whole config block (i.e. when you're debugging something).

One could simply comment out the whole code. I do not feel that we need this configuration option. I prefer to minimize the scope if possible; similarly to what we agreed with SPLUNK_REALM and SPLUNK_ACCESS_TOKEN.

Comment on lines 267 to 274
profiling:
disabled: false # SPLUNK_PROFILER_ENABLED
endpoint: "" # SPLUNK_PROFILER_LOGS_ENDPOINT
memory_profiling_enabled: true # SPLUNK_PROFILER_MEMORY_ENABLED
callgraphs:
disabled: false # SPLUNK_SNAPSHOT_PROFILER_ENABLED
sampling_interval: 10 # SPLUNK_SNAPSHOT_SAMPLING_INTERVAL
selection_probability: 0.01 # SPLUNK_SNAPSHOT_SAMPLING_INTERVAL
Copy link
Contributor

@pellared pellared Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be something like

Suggested change
profiling:
disabled: false # SPLUNK_PROFILER_ENABLED
endpoint: "" # SPLUNK_PROFILER_LOGS_ENDPOINT
memory_profiling_enabled: true # SPLUNK_PROFILER_MEMORY_ENABLED
callgraphs:
disabled: false # SPLUNK_SNAPSHOT_PROFILER_ENABLED
sampling_interval: 10 # SPLUNK_SNAPSHOT_SAMPLING_INTERVAL
selection_probability: 0.01 # SPLUNK_SNAPSHOT_SAMPLING_INTERVAL
profiling:
processors:
- batch:
schedule_delay: 5000 # SPLUNK_PROFILER_EXPORT_INTERVAL
export_timeout: 30000 # SPLUNK_PROFILER_EXPORT_TIMEOUT
exporter:
otlp_log_http:
endpoint: http://localhost:4318/v1/logs # SPLUNK_PROFILER_LOGS_ENDPOINT
memory_allocation_sampling: # SPLUNK_PROFILER_MEMORY_ENABLED
sampling_interval / sampling_rate: # SPLUNK_PROFILER_MEMORY_EVENT_RATE / SPLUNK_PROFILER_MAX_MEMORY_SAMPLES ?
callstack_sampling: # SPLUNK_PROFILER_ENABLED
sampling_interval: 10 # SPLUNK_PROFILER_CALL_STACK_INTERVAL
snapshoting: # SPLUNK_SNAPSHOT_PROFILER_ENABLED
callstack_sampling_interval: 10 # SPLUNK_SNAPSHOT_SAMPLING_INTERVAL
trace_selection_probability: 0.01 # SPLUNK_SNAPSHOT_SELECTION_PROBABILTY

Discussed with @lachmatt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (memory_allocation_sampling, callstack_sampling, snapshotting) is completely different wording than we have in our documentation. Why would you not keep parity here?

Copy link
Contributor

@pellared pellared Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Together with @lachmatt we find the current wording (from the documentation) confusing and misleading. We propose changing the names to better represent the behavior of the profilers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the proposal misses some of the configuration supported by Java.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving it some thought again after our discussion, I think it doesn't make sense because of 2 reasons:

  • Distros might not use the OpenTelemetry pipeline for profiling data exports. Node.js doesn't - we recently had a custom exporter (without using the OTel logs processors), but switched to an OTel logs exporter now. We are not using the logs processing pipeline and don't think we ever will.
  • While the naming might be more correct, differentiating this from our current customer terminology and documentation is bound to sow confusion.

An unrelated to your comment, but a related issue is that while going over the configuration, I noticed that you can't actually set the endpoint for callgraphs explicitly with file based config. The documentation we have refers to OTEL_EXPORTER_OTLP_ENDPOINT. However this does not make sense much with file based config when the pipelines are in their own sections.

We either have to create a new configuration option for a callgraphs endpoint or we use the one from profiler.
Now both the CPU profiler and callgraphs can be active irrespective of each other. When (as previously proposed) we don't have an enabled/disabled flag under profiling we can end up in a case where we can't set the endpoint for callgraphs, e.g.:

distribution:
  splunk:
    #profiling:
    #  endpoint: "collector:4318/v1/logs"
    callgraphs:
      trace_selection_probability: 0.05
      # Since profiling is commented out, we have no URL to use

#logger_provider:
#  processors:
#    - batch:
#        exporter:
#           otlp_http:
#              endpoint: ""  

Some of the options we have:

distribution:
  splunk:
    profiling: # Signal is not enabled/disabled by the existence of the node
      disabled: true
      endpoint: "collector:4318/v1/logs"
    callgraphs:
      trace_selection_probability: 0.05

or

distribution:
  splunk:
    # profiling: # Profiling is disabled
    #  endpoint: "collector:4318/v1/logs"
    callgraphs:
      endpoint: "collector:4318/v1/logs"
      trace_selection_probability: 0.05

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And my question about otlp_log_http.
I think we should use the term from the spec, otlp_http/otlp_grpc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And my question about otlp_log_http. I think we should use the term from the spec, otlp_http/otlp_grpc.

Agree

Copy link
Contributor

@pellared pellared Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the term from the spec, otlp_http/otlp_grpc.

The problem is that we are using a "log OTLP exporter" not "profiling OTLP expoter" which we may want to in future. This is why I added _log_ 😉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey is there any significant reason for changing the naming of profiling related attributes?
I would preferer to keep it fairly simple and using terms which are more common in the industry (cpu/memory profiler / profiling etc)
Also using term "snapshotting" sounds bit ambiguous - as snapshot can be almost a specific time-bounded data of any kind where callgraphs or stack traces more specifically call it what it is
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akubik-splunk note we are already using term "snapshots" in e.g. env variables in the spec

Comment on lines 276 to 287
runtime:
nodejs:
metrics:
runtime_metrics:
enabled: true # SPLUNK_RUNTIME_METRICS_ENABLED
collection_interval: 30000 # SPLUNK_RUNTIME_METRICS_COLLECTION_INTERVAL
instrumentation_metrics_enabled: true # SPLUNK_INSTRUMENTATION_METRICS_ENABLED
autoinstrument_packages:
- "MyApiService"
- "MyOtherService"
java:
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving under instrumentation node.

@robsunday
Copy link

robsunday commented Nov 27, 2025

I think we need to add a bit of structure to our discussion. We all post snippets with names based on current names in our distros. To be on the same page let's agree on top level node names, because now we don't even have consistency on profiler type node names. My proposals of few top-level nodes are:
a)

distribution:
  splunk:
    profiler:
      exporter: # Common endpoint configuration used for both types of profiler
      always_on:  # If present then it means it is enabled. Memory profiling will be nested node
      call_graph: # If present then it means it is enabled. 

b)

distribution:
  splunk:
    profiling:
      exporter: # Common endpoint configuration used for both types of profiler
      profiler:
        always_on:  # If present then it means it is enabled. Memory profiling will be nested node
        call_graph: # If present then it means it is enabled. 

Once we agree on these names let's use them consistently in subsequent comments.
What do you think? Any other ideas?

Comment on lines 316 to 317
"@opentelemetry/instrumentation-pg":
disabled: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"@opentelemetry/instrumentation-pg":
disabled: true
"@opentelemetry/instrumentation-pg":
disabled: true
dotnet:
traces:
response_header_enabled: true # SPLUNK_TRACE_RESPONSE_HEADER_ENABLED

memory_profiler: # SPLUNK_PROFILER_MEMORY_ENABLED
callgraphs: # SPLUNK_SNAPSHOT_PROFILER_ENABLED
sampling_interval: 10 # SPLUNK_SNAPSHOT_SAMPLING_INTERVAL
selection_probability: 0.01 # SPLUNK_SNAPSHOT_SELECTION_PROBABILITY

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to suggest one change in profiling settings.
It is related to the fact that CPU profiling and memory profiling are described as a part of AlwaysOn Profiling. Also in Java distribution both types of profiling are based on the same mechanism: JFR.
There is set of 6 additional settings for JFR that are specific for Java and they are shared between CPU and memory profiling. Below is example of the profiling configuration with all the additional settings:

distribution:
  splunk:
    profiling:
      exporter:
        otlp_http:                  # SPLUNK_PROFILER_OTLP_PROTOCOL
          endpoint: ""              # SPLUNK_PROFILER_LOGS_ENDPOINT

      always_on:
        include_agent_internals: true # SPLUNK_PROFILER_INCLUDE_AGENT_INTERNALS (Java)
        include_jvm_internals: true   # SPLUNK_PROFILER_INCLUDE_JVM_INTERNALS   (Java)
        stack_depth_limit: 256        # SPLUNK_PROFILER_MAX_STACK_DEPTH         (Java)
        keep_recording_files: true    # SPLUNK_PROFILER_KEEP_FILES              (Java)
        recording_directory: "/tmp/profiler" # SPLUNK_PROFILER_DIRECTORY        (Java)
        recording_duration: 20000     # SPLUNK_PROFILER_RECORDING_DURATION      (Java)

        cpu_profiler:               # SPLUNK_PROFILER_ENABLED
          sampling_interval: 10     # SPLUNK_PROFILER_CALL_STACK_INTERVAL
        memory_profiler:            # SPLUNK_PROFILER_MEMORY_ENABLED
          event_rate:               # SPLUNK_PROFILER_MEMORY_EVENT_RATE         (Java)
          event_rate_limit:         # SPLUNK_PROFILER_MEMORY_EVENT_RATE_LIMIT   (Java)
          native_sampling:          # SPLUNK_PROFILER_MEMORY_NATIVE_SAMPLING    (Java)

      call_graphs:                  # SPLUNK_SNAPSHOT_PROFILER_ENABLED
        sampling_interval: 10       # SPLUNK_SNAPSHOT_SAMPLING_INTERVAL
        export_interval: 5000       # SPLUNK_PROFILER_EXPORT_INTERVAL           (Java)
        selection_probability: 0.01 # SPLUNK_SNAPSHOT_SELECTION_PROBABILITY
        staging_capacity: 1000      # SPLUNK_SNAPSHOT_PROFILER_STAGING_CAPACITY (Java)
        stack_depth_limit: 512      # SPLUNK_SNAPSHOT_PROFILER_MAX_STACK_DEPTH  (Java)

I grouped cpu_profiler and memory_profiler nodes under common parent: always_on. All common JFR properties specific to Java are also under this node. All of them are optional, and should be ignored by non Java distros.
What do you think about this idea?

Copy link
Contributor

@ysolomchenko ysolomchenko Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m fine with this structure, but we need to add the exporter interval and timeout, remove this from callgraphs, and maybe adjust the names to:

  • cpu_profiler -> callstack_sampling
  • memory_profiler -> memory_allocation_sampling
  • call_graphs -> snapshotting

We also need to choose the correct property name for the memory_allocation_sampling configuration:
sampling_interval / sampling_rate / event_rate - SPLUNK_PROFILER_MEMORY_EVENT_RATE SPLUNK_PROFILER_MAX_MEMORY_SAMPLES.

distribution:
  splunk:
    profiling:
      exporter:
        schedule_delay: 5000        # SPLUNK_PROFILER_EXPORT_INTERVAL (.NET)
        export_timeout: 30000       # SPLUNK_PROFILER_EXPORT_TIMEOUT (.NET)
        otlp_http:                  # SPLUNK_PROFILER_OTLP_PROTOCOL
          endpoint: ""              # SPLUNK_PROFILER_LOGS_ENDPOINT

      always_on:
        include_agent_internals: true # SPLUNK_PROFILER_INCLUDE_AGENT_INTERNALS (Java)
        include_jvm_internals: true   # SPLUNK_PROFILER_INCLUDE_JVM_INTERNALS   (Java)
        stack_depth_limit: 256        # SPLUNK_PROFILER_MAX_STACK_DEPTH         (Java)
        keep_recording_files: true    # SPLUNK_PROFILER_KEEP_FILES              (Java)
        recording_directory: "/tmp/profiler" # SPLUNK_PROFILER_DIRECTORY        (Java)
        recording_duration: 20000     # SPLUNK_PROFILER_RECORDING_DURATION      (Java)

        callstack_sampling:                                # SPLUNK_PROFILER_ENABLED
          sampling_interval: 10                            # SPLUNK_PROFILER_CALL_STACK_INTERVAL

        memory_allocation_sampling:                        # SPLUNK_PROFILER_MEMORY_ENABLED
          sampling_interval: # SPLUNK_PROFILER_MEMORY_EVENT_RATE /  SPLUNK_PROFILER_MAX_MEMORY_SAMPLES 
          event_rate_limit:                                # SPLUNK_PROFILER_MEMORY_EVENT_RATE_LIMIT   (Java)
          native_sampling:                                 # SPLUNK_PROFILER_MEMORY_NATIVE_SAMPLING    (Java)

      snapshoting:                  # SPLUNK_SNAPSHOT_PROFILER_ENABLED
        sampling_interval: 10       # SPLUNK_SNAPSHOT_SAMPLING_INTERVAL
        selection_probability: 0.01 # SPLUNK_SNAPSHOT_SELECTION_PROBABILITY
        staging_capacity: 1000      # SPLUNK_SNAPSHOT_PROFILER_STAGING_CAPACITY (Java)
        stack_depth_limit: 512      # SPLUNK_SNAPSHOT_PROFILER_MAX_STACK_DEPTH  (Java)

@seemk seemk marked this pull request as ready for review December 8, 2025 12:11
@seemk seemk changed the title [DRAFT] Splunk specific file based configuration fields Splunk specific file based configuration fields Dec 8, 2025
@seemk
Copy link
Contributor Author

seemk commented Dec 8, 2025

Updated in accordance with our latest discussion, moved it for review.

cpu_profiler: # SPLUNK_PROFILER_ENABLED
sampling_interval: 10 # SPLUNK_PROFILER_CALL_STACK_INTERVAL
memory_profiler: # SPLUNK_PROFILER_MEMORY_ENABLED
callgraphs: # SPLUNK_SNAPSHOT_PROFILER_ENABLED

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callgraphs should be nested one level less. It should be sibling of always_on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

@seemk
Copy link
Contributor Author

seemk commented Dec 8, 2025

While the configuration for profiling seems to be mostly finalized, the distro specific customizations seem to fall into 2 categories: having Splunk specific configuration options under instrumentation/development or (as proposed in the PR originally) having the options under a distribution.splunk node.

Here examples for both of these variants (based on a simplified Node.js config), just for a quick comparison:

Generalized configuration under distribution.splunk

distribution:
  splunk:
    general:
      runtime_metrics:
        collection_interval: 30000
      trace_response_header_enabled: true
      use_bundled_instrumentations: true
      graphql_resolve_spans_enabled: true
      nextjs_cardinality_reduction_enabled: true
      package_name_filter:
        - "MyApiService"
        - "MyOtherService"

Splunk specific variables under instrumentation/development

instrumentation/development:
  js:
    splunk:
      use_bundled_instrumentations: true
      package_name_filter:
        - "MyApiService"
        - "MyOtherService"
    metrics:
      splunk_runtime_metrics:
        collection_interval: 30000
    traces: # Note traces, metrics and the instrumentation naming would be Splunk specific. Afaik there is no agreement yet in upstream
      http:
        splunk_trace_response_header_enabled: true
      graphql:
        splunk_resolve_spans_enabled: true
      nextjs:
        splunk_cardinality_reduction_enabled: true

I'd rather have all Splunk specific configuration under the distribution.splunk node with the reasoning:

  • Keeps Splunk specific configuration localized instead of having it all over the place in the config file. Meaning everything outside of distribution.splunk would be standard OTel config.
  • The traces and metrics keys under "instrumentation/development".js are not agreed upon in upstream for Node.js.
  • splunk_* keys might break upstream configuration loading (i.e. they encounter an unsupported key, which is supported by our distros).

@ysolomchenko
Copy link
Contributor

ysolomchenko commented Dec 9, 2025

While the configuration for profiling seems to be mostly finalized, the distro specific customizations seem to fall into 2 categories: having Splunk specific configuration options under instrumentation/development or (as proposed in the PR originally) having the options under a distribution.splunk node.

Here examples for both of these variants (based on a simplified Node.js config), just for a quick comparison:

Generalized configuration under distribution.splunk

distribution:
  splunk:
    general:
      runtime_metrics:
        collection_interval: 30000
      trace_response_header_enabled: true
      use_bundled_instrumentations: true
      graphql_resolve_spans_enabled: true
      nextjs_cardinality_reduction_enabled: true
      package_name_filter:
        - "MyApiService"
        - "MyOtherService"

Splunk specific variables under instrumentation/development

instrumentation/development:
  js:
    splunk:
      use_bundled_instrumentations: true
      package_name_filter:
        - "MyApiService"
        - "MyOtherService"
    metrics:
      splunk_runtime_metrics:
        collection_interval: 30000
    traces: # Note traces, metrics and the instrumentation naming would be Splunk specific. Afaik there is no agreement yet in upstream
      http:
        splunk_trace_response_header_enabled: true
      graphql:
        splunk_resolve_spans_enabled: true
      nextjs:
        splunk_cardinality_reduction_enabled: true

I'd rather have all Splunk specific configuration under the distribution.splunk node with the reasoning:

  • Keeps Splunk specific configuration localized instead of having it all over the place in the config file. Meaning everything outside of distribution.splunk would be standard OTel config.
  • The traces and metrics keys under "instrumentation/development".js are not agreed upon in upstream for Node.js.
  • splunk_* keys might break upstream configuration loading (i.e. they encounter an unsupported key, which is supported by our distros).

IMO it should be under instrumentation/development,
because it is logically correct for instrumentation-related configuration to be placed there.
It is also more appropriate from the specification’s point of view, since the instrumentation/development node has an extension point.
And if we want to stay aligned with the configuration specification, we should put them under instrumentation/development.

@robsunday
Copy link

I also think that this stuff should go under .instrumentation/development.{lang}. It is related to the instrumentation, as @ysolomchenko pointed out.
Additionally there is a very small number of options common for all languages.
.instrumentation/development is already designed with multiple languages in mind, while .distribution is not, and we would have to mimic the structure of .instrumentation/development to avoid a mess

Keeps Splunk specific configuration localized instead of having it all over the place in the config file. Meaning everything outside of distribution.splunk would be standard OTel config.

Some parts of Splunk specific configuration is already spread across the config file, because every custom ComponentProvider can be referenced only from a dedicated YAML location.

@seemk
Copy link
Contributor Author

seemk commented Dec 9, 2025

I've moved the configuration to under instrumentation/development.{lang}.

So it'll be like:

instrumentation/development:
  js:
    splunk:
      use_bundled_instrumentations: true
      package_name_filter:
        - "MyApiService"
        - "MyOtherService"
    metrics:
      splunk_runtime_metrics:
        collection_interval: 30000
    traces: # Note traces, metrics and the instrumentation naming would be Splunk specific. Afaik there is no agreement yet in upstream
      http:
        splunk_trace_response_header_enabled: true
      graphql:
        splunk_resolve_spans_enabled: true
      nextjs:
        splunk_cardinality_reduction_enabled: true

But at this point would it make sense to move distro.splunk.profiling to instrumentation/development.{lang}.splunk.profiling ?

@seemk
Copy link
Contributor Author

seemk commented Dec 10, 2025

Anyway I'm fine with going the now proposed structure.

Comment on lines 305 to 324
instrumentation/development:
js:
# Language specific distro configuration.
# splunk keyword is for general distro config.
splunk:
use_bundled_instrumentations: false
package_name_filter:
- "MyApiGw"
metrics:
splunk_runtime_metrics:
collection_interval: 30000
traces:
http:
splunk_trace_response_header_enabled: true # SPLUNK_TRACE_RESPONSE_HEADER_ENABLED
splunk_capture_uri_parameters:
- "userId"
redis:
splunk_include_command_args: true # SPLUNK_REDIS_INCLUDE_COMMAND_ARGS
pg:
disabled: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we already have splunk: as a semi-node here, we can remove the splunk_ prefix from the properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For .NET, it might be an issue to create this splunk: node under instrumentation/development:
For example we have SPLUNK_TRACE_RESPONSE_HEADER_ENABLED env var.
In yaml it should be like this

instrumentation/development:
  dotnet:
    traces: 
      aspnet:
        splunk_response_header_enabled: true
      aspnetcore:
        splunk_response_header_enabled: true

These instrumentations aspnet and aspnetcore come from upstream, not from our distro. In this case, IMO using the prefix is better than adding a separate node under instrumentation/development:{language}:.

If we add this node, it will look like this:

instrumentation/development:
  dotnet:
    traces: 
      aspnet:
      aspnetcore:
    splunk: 
      traces:
        response_header_enabled: true

As you can see, this structure looks a bit ugly and inconsistent compared to the first proposal.

Copy link

@robsunday robsunday Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opnion under instrumentation/development.{language} node we do not have to be fully consistent, however it would be good to have consistency on common settings.
SPLUNK_TRACE_RESPONSE_HEADER_ENABLED seems to be the only property besides profiling that is common across all the languages.

Actually I thought this would be good enough:

instrumentation/development:
  java:
    splunk: 
      trace_response_header_enabled: true
  dotnet:
    splunk: 
      trace_response_header_enabled: true
  nodejs:
    splunk: 
      trace_response_header_enabled: true
  etc...

trace_response_header_enabled is a valid name in my opinion because this setting according to the docs:

Enables adding server trace information to HTTP response headers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opnion under instrumentation/development.{language} node we do not have to be fully consistent, however it would be good to have consistency on common settings. SPLUNK_TRACE_RESPONSE_HEADER_ENABLED seems to be the only property besides profiling that is common across all the languages.

Actually I thought this would be good enough:

instrumentation/development:
  java:
    splunk: 
      trace_response_header_enabled: true
  dotnet:
    splunk: 
      trace_response_header_enabled: true
  nodejs:
    splunk: 
      trace_response_header_enabled: true
  etc...

trace_response_header_enabled is a valid name in my opinion because this setting according to the docs:

Enables adding server trace information to HTTP response headers

So why not have it under under distribution.splunk then? The spec now has a top level node for distro specific features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seemk So my opinion it should be

instrumentation/development:
  dotnet:
    traces: 
      aspnet:
        splunk_response_header_enabled: true
      aspnetcore:
        splunk_response_header_enabled: true
  js:
    traces:
      http:
        splunk_response_header_enabled: true
  java:
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 issues I have with this:

  1. Duplicated configuration option (aspnet, aspnetcore), correct me if I'm wrong about .net, but on Node both programmatic and env var based config has the option to toggle it in a single place.
  2. As an user, is it intuitive that you need to know the instrumentation names for our distro specific config? One point of our distros has been to remove the gnarlyness of OTel behind sensible, easy options

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two instrumentation libraries because of .NET version differences. For .NET Framework, we use aspnet, and for .NET (Core), we use aspnetcore. response_header_enabled will not working without aspnet or aspnetcore so the end users should know the instrumentation name anyway if they want use the file based configuration with response_header_enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when the setting is in the distro configuration (once), why can't the distro figure it out for the user? Do you explicitly have to list the instrumentations you use for .net?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think the lang distros can put whatever they want in their lang block. We're spending way too much time on trivial configuration fields.

I think it provides good solution for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when the setting is in the distro configuration (once), why can't the distro figure it out for the user? Do you explicitly have to list the instrumentations you use for .net?

Yes, you need to explicitly list them to enable.

There is an issue with the templates that were already created, but I think it’s a bit stuck. Because main idea of file-based configuration is that what you see is what you get.

Comment on lines 316 to 318
traces:
http:
splunk_trace_response_header_enabled: true # SPLUNK_TRACE_RESPONSE_HEADER_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is already under traces, the property name could simply be response_header_enabled:

Copy link
Contributor Author

@seemk seemk Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It's a Splunk extension and is in parity with the env var

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because its redundant, trace under traces node its like butter on butter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you meant trace_ not splunk_

@seemk
Copy link
Contributor Author

seemk commented Dec 10, 2025

In fact, I think the lang distros can put whatever they want in their lang block. We're spending way too much time on trivial configuration fields.

So just write down a summary of how you're going to configure a language distro and I'll add it to the example yaml for the given language.

@seemk
Copy link
Contributor Author

seemk commented Dec 10, 2025

I've only kept the distribution.splunk.profiling configuration in the example yaml and added a clause that language distributions can add their own language specific configuration where it makes the most sense. Currently OTel JS has no convention for instrumentation specific file config, but .NET does, so I don't think it makes sense to discuss here where should the language/instrumentation specific values go yet. This can be discussed further down the line when upstream implementations mature.

agreed on call with Robert, Siim and Yevhenii

Co-authored-by: Yevhenii Solomchenko <[email protected]>
Copy link

@robsunday robsunday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@ysolomchenko
Copy link
Contributor

Quick question about SPLUNK_ACCESS_TOKEN and SPLUNK_REALM.
Are we blocking them (no usage), or should we use environment variables with a file-based approach?

@seemk
Copy link
Contributor Author

seemk commented Dec 11, 2025

Quick question about SPLUNK_ACCESS_TOKEN and SPLUNK_REALM. Are we blocking them (no usage), or should we use environment variables with a file-based approach?

Env vars with a file based approach would be the recommended then, in case you don't want the tokens in the file

@robsunday
Copy link

Template can look like:

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp_http:
            endpoint: https://ingest.${SPLUNK_REALM}.signalfx.com/v1/traces
            headers:
              - name: "X-SF_TOKEN”
                value: ${SPLUNK_ACCESS_TOKEN}

@Kielek Kielek merged commit 2837151 into main Dec 11, 2025
2 checks passed
@Kielek Kielek deleted the file-based-config branch December 11, 2025 13:10
@github-actions github-actions bot locked and limited conversation to collaborators Dec 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants