Skip to content

Conversation

@wantsui
Copy link
Collaborator

@wantsui wantsui commented Nov 7, 2025

What does this PR do?

The goal of AIDM-253 is to add process tags to the trace payloads.

After this gets merged, the next step is to add it for the other products.

To run the tests in docker

docker compose run --rm tracer-3.3 /bin/bash

Main tests:

bundle install --gemfile /app/gemfiles/ruby_3.3_rails8.gemfile
bundle exec rake compile
bundle exec rake test:core_with_rails
TEST_DATADOG_INTEGRATION=1 BUNDLE_GEMFILE=/app/gemfiles/ruby_3.3_rails8.gemfile bundle exec rspec spec/datadog/core/environment/process_spec.rb


bundle exec rspec spec/datadog/tracing/transport/trace_formatter_spec.rb
bundle exec rspec spec/datadog/core/tag_normalizer_spec.rb
bundle exec rspec spec/datadog/core/configuration/settings_spec.rb

Motivation:

We're trying to add process tags to various payloads so they can be used in different use cases.

Note I still want to try adding server type but I'll have to tackle that in a separate PR.

Change log entry

Adds process tags to trace payloads with the new environment variable: DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED.

Additional Notes:

How to test the change?

… This is still missing memoization and additional tests.
@github-actions github-actions bot added core Involves Datadog core libraries tracing labels Nov 7, 2025
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

👋 Hey @DataDog/ruby-guild, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

Visited at: 2025-11-20 21:39:31 UTC

@datadog-official
Copy link

datadog-official bot commented Nov 7, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: e83bc4a | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@wantsui wantsui added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Nov 7, 2025
def tag_process_tags!
return unless trace.experimental_propagate_process_tags_enabled
process_tags = Core::Environment::Process.formatted_process_tags_k1_v1
return if process_tags.empty?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is impossible right? If so, we can remove it, as it would give us a false sense of uncertainty here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed it in 8dae705 by just removing the check in process tags, but let me know if you spot issues with it!

@pr-commenter
Copy link

pr-commenter bot commented Nov 10, 2025

Benchmarks

Benchmark execution time: 2025-11-21 00:35:39

Comparing candidate commit e83bc4a in PR branch add-process-tags-to-tracing with baseline commit 2c84fc0 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

format!
expect(first_span.meta).to include('_dd.tags.process')
expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized)
# TODO figure out if we need an assertion for the value, ie
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcotc - do you think there's value in asserting for the values of the tag? Or is the test in process_spec enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are doing with expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized) seems good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't test realistic values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing to test here is that it's respecting the configuring option, which you did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing to test here is that it's respecting the configuring option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! In that case it doesn't seem like I need to make any changes to the assertions then?

@github-actions
Copy link

github-actions bot commented Nov 11, 2025

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 2 partially typed methods, and clears 1 partially typed method. It increases the percentage of typed methods from 54.48% to 54.58% (+0.1%).

Partially typed methods (+2-1)Introduced:
sig/datadog/core/tag_normalizer.rbs:11
└── def self.normalize: (untyped original_value, ?remove_digit_start_char: bool) -> ::String
sig/datadog/core/utils.rbs:8
└── def self.utf8_encode: (untyped str, ?binary: bool, ?replace_invalid: bool, ?placeholder: untyped) -> untyped
Cleared:
sig/datadog/core/utils.rbs:8
└── def self.utf8_encode: (untyped str, ?binary: bool, ?placeholder: untyped) -> untyped

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept to the end of the line to remove it from the stats.

@p-datadog
Copy link
Member

This PR needs to be merged/rebased on master to fix the CI failures.

@wantsui wantsui requested a review from p-datadog November 20, 2025 20:48
module Environment
# Retrieves process level information such that it can be attached to various payloads
module Process
extend self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some debates about module_function which is a different tactic for what extend self is accomplishing here.

If you change the methods to def self.serialized and so on, does everything still work?

The idea is to have the definitions only in one scope - extend self makes them exist in two scopes (the module method as included in classes and on the module namespace itself).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally, this refactor still works and I've pushed this commit: a336c66, so now to wait for CI!

# @param remove_digit_start_char [Boolean] - whether to remove the leading digit (currently only used for tag values)
# @return [String] The normalized string
def self.normalize(original_value, remove_digit_start_char: false)
transformed_value = original_value.to_s.encode('UTF-8', invalid: :replace, undef: :replace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a function for this in Core::Utils#utf8_encode

normalized_value.sub!(leading_invalid_regex, "")

normalized_value.squeeze!('_') if normalized_value.include?('__')
normalized_value.sub!(TRAILING_UNDERSCORES, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically after squeeze we can't have more that _ at the end, do we really need a regex to ditch a single char?

Comment on lines +40 to +44
normalized_value = if transformed_value.ascii_only? && transformed_value.length <= MAX_BYTE_SIZE
transformed_value
else
transformed_value.byteslice(0, MAX_BYTE_SIZE).scrub("")
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can streamline this a bit and reduce 1 allocation. And you can go further and don't use different names, because it doesn't matter, just use 1 universal name for the variable.

Suggested change
normalized_value = if transformed_value.ascii_only? && transformed_value.length <= MAX_BYTE_SIZE
transformed_value
else
transformed_value.byteslice(0, MAX_BYTE_SIZE).scrub("")
end
normalized_value = transformed_value
if normalized_value.ascii_only? && normalized_value.length <= MAX_BYTE_SIZE
normalized_value = normalized_value.byteslice(0, MAX_BYTE_SIZE)
normalized_value.scrub!("")
end

def serialized
return @serialized if defined?(@serialized)
tags = []
tags << "#{Environment::Ext::TAG_ENTRYPOINT_WORKDIR}:#{TagNormalizer.normalize(entrypoint_workdir, remove_digit_start_char: false)}" if entrypoint_workdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but when it will be nil?

tags << "#{Environment::Ext::TAG_ENTRYPOINT_WORKDIR}:#{TagNormalizer.normalize(entrypoint_workdir, remove_digit_start_char: false)}" if entrypoint_workdir
tags << "#{Environment::Ext::TAG_ENTRYPOINT_NAME}:#{TagNormalizer.normalize(entrypoint_name, remove_digit_start_char: false)}" if entrypoint_name
tags << "#{Environment::Ext::TAG_ENTRYPOINT_BASEDIR}:#{TagNormalizer.normalize(entrypoint_basedir, remove_digit_start_char: false)}" if entrypoint_basedir
tags << "#{Environment::Ext::TAG_ENTRYPOINT_TYPE}:#{TagNormalizer.normalize(entrypoint_type, remove_digit_start_char: false)}" if entrypoint_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, it's literally reading the constant ... it's not nil at all

t.pattern = 'spec/**/*_spec.rb'
t.exclude_pattern = 'spec/**/{appsec/integration,contrib,benchmark,redis,auto_instrument,opentelemetry,open_feature,profiling,crashtracking,error_tracking,rubocop,data_streams}/**/*_spec.rb,' \
' spec/**/{auto_instrument,opentelemetry,process_discovery,stable_config,ddsketch,open_feature}_spec.rb,' \
' spec/**/{auto_instrument,opentelemetry,process_discovery,stable_config,ddsketch,open_feature}_spec.rb,spec/datadog/core/environment/process_spec.rb,' \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if you read the beginning of the line, it has some globbing that suites you

Suggested change
' spec/**/{auto_instrument,opentelemetry,process_discovery,stable_config,ddsketch,open_feature}_spec.rb,spec/datadog/core/environment/process_spec.rb,' \
' spec/**/{auto_instrument,opentelemetry,process_discovery,stable_config,ddsketch,open_feature,process}_spec.rb' \

or at least core/process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos core Involves Datadog core libraries tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants