Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tracing): Port sample_rand to POTel #4106

Merged
merged 53 commits into from
Mar 27, 2025

Conversation

sentrivana
Copy link
Contributor

@sentrivana sentrivana commented Feb 27, 2025

Port sample_rand to potel-base. See spec.

There are now two places where a sample_rand might be generated:

  • If we're explicitly propagating with continue_trace, we'll backfill sample_rand on the propagation context like on master, either using the incoming one or generating a new one from the incoming sampled/sample_rate.
  • Otherwise, we generate a new sample_rand in the Sampler.

The generated sample_rand is then saved on the trace state.

This change fixes most of the failures in the Common test suite.

Closes #4027

Copy link

codecov bot commented Feb 27, 2025

❌ 108 Tests Failed:

Tests completed Failed Passed Skipped
20866 108 20758 4476
View the top 3 failed test(s) by shortest run time
 tests.integrations.celery.integration_tests.test_celery_beat_cron_monitoring
Stack Traces | 0s run time
.../hostedtoolcache/Python/3.7.17.../x64/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1006: in _gcd_import
    ???
<frozen importlib._bootstrap>:983: in _find_and_load
    ???
<frozen importlib._bootstrap>:953: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:219: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1006: in _gcd_import
    ???
<frozen importlib._bootstrap>:983: in _find_and_load
    ???
<frozen importlib._bootstrap>:967: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:677: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:728: in exec_module
    ???
<frozen importlib._bootstrap>:219: in _call_with_frames_removed
    ???
.../celery/integration_tests/__init__.py:7: in <module>
    from celery.beat import Scheduler
.tox/py3.7-celery-v5.0.5/lib/python3.7.../site-packages/celery/beat.py:22: in <module>
    from . import __version__, platforms, signals
.tox/py3.7-celery-v5.0.5/lib/python3.7.../site-packages/celery/signals.py:14: in <module>
    from .utils.dispatch import Signal
.tox/py3.7-celery-v5.0.5/lib/python3.7.../celery/utils/__init__.py:16: in <module>
    from .nodenames import nodename, nodesplit, worker_direct
.tox/py3.7-celery-v5.0.5/lib/python3.7.../celery/utils/nodenames.py:6: in <module>
    from kombu.entity import Exchange, Queue
.tox/py3.7-celery-v5.0.5/lib/python3.7.../site-packages/kombu/entity.py:7: in <module>
    from .serialization import prepare_accept_content
.tox/py3.7-celery-v5.0.5/lib/python3.7.../site-packages/kombu/serialization.py:440: in <module>
    for ep, args in entrypoints('kombu.serializers'):  # pragma: no cover
.tox/py3.7-celery-v5.0.5/lib/python3.7.../kombu/utils/compat.py:82: in entrypoints
    for ep in importlib_metadata.entry_points().get(namespace, [])
E   AttributeError: 'EntryPoints' object has no attribute 'get'
 tests.integrations.celery.test_celery
Stack Traces | 0s run time
ImportError while importing test module '.../integrations/celery/test_celery.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
.tox/py3.7-celery-v5.0.5/lib/python3.7.../site-packages/_pytest/python.py:617: in _importtestmodule
    mod = import_path(self.path, mode=importmode, root=self.config.rootpath)
.tox/py3.7-celery-v5.0.5/lib/python3.7.../site-packages/_pytest/pathlib.py:567: in import_path
    importlib.import_module(module_name)
.../hostedtoolcache/Python/3.7.17.../x64/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1006: in _gcd_import
    ???
<frozen importlib._bootstrap>:983: in _find_and_load
    ???
<frozen importlib._bootstrap>:967: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:677: in _load_unlocked
    ???
.tox/py3.7-celery-v5.0.5/lib/python3.7.../_pytest/assertion/rewrite.py:186: in exec_module
    exec(co, module.__dict__)
.../integrations/celery/test_celery.py:6: in <module>
    from celery import Celery, VERSION
E   ImportError: cannot import name 'Celery' from 'celery' (.../sentry-python/sentry-python/.tox/py3.7-celery-v5.0.5/lib/python3.7.../site-packages/celery/__init__.py)
 tests.integrations.celery.test_update_celery_task_headers
Stack Traces | 0s run time
.../integrations/celery/beat.py:28: in <module>
    from celery import Task, Celery  # type: ignore
E   ImportError: cannot import name 'Task' from 'celery' (.../sentry-python/sentry-python/.tox/py3.7-celery-v5.0.5/lib/python3.7.../site-packages/celery/__init__.py)

During handling of the above exception, another exception occurred:
.../integrations/celery/test_update_celery_task_headers.py:7: in <module>
    from sentry_sdk.integrations.celery import _update_celery_task_headers
.../integrations/celery/__init__.py:9: in <module>
    from sentry_sdk.integrations.celery.beat import (
.../integrations/celery/beat.py:37: in <module>
    raise DidNotEnable("Celery not installed")
E   sentry_sdk.integrations.DidNotEnable: Celery not installed

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@sentrivana sentrivana force-pushed the ivana/potel/port-sample-rand branch from 04dc3b4 to 49a3507 Compare March 21, 2025 08:11
Copy link
Member

@szokeasaurusrex szokeasaurusrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @sentrivana!

I think this looks mostly good, although I did notice a couple things which may need to be changed, and I would like some clarification on some of the changes.

@@ -122,6 +150,9 @@ def sampled_result(span_context, attributes, sample_rate):
if sample_rate is not None:
trace_state = trace_state.update(TRACESTATE_SAMPLE_RATE_KEY, str(sample_rate))

if sample_rand is not None:
trace_state = trace_state.update(TRACESTATE_SAMPLE_RAND_KEY, str(sample_rand))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important that we ensure the stringified sample_rand has six decimal places here.

Are we already rounding it to six decimal places in the calling code? If yes, please mention in the docstring that the calling code needs to round to six decimal places. If not, then we need to add the rounding logic here, and ideally also add a unit test to verify rounding happens as expected.

In either case, I would use f"{sample_rand:.6f}" rather than str(sample_rand).

Suggested change
trace_state = trace_state.update(TRACESTATE_SAMPLE_RAND_KEY, str(sample_rand))
sample_rand = sample_rand.quantize(
Decimal("0.000001"), rounding=ROUND_DOWN, context=Context(prec=6)
)
trace_state = trace_state.update(TRACESTATE_SAMPLE_RAND_KEY, f"{sample_rand:.6f}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added a test: 34cb097

Comment on lines +271 to 279
"incoming_sample_rate": 0.3,
"incoming_sampled": "false",
"sentry_trace_header_parent_sampled": 0,
"use_local_traces_sampler": True,
"local_traces_sampler_result": 0.5,
"local_traces_sampler_result": 0.25,
"local_traces_sample_rate": 0.7,
},
0.5, # expected_sample_rate
0.25, # expected_sample_rate
"false", # expected_sampled (traces sampler can override parent sampled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the first bullet point – shouldn't we still test the case where sample_rate and sampled values are incompatible with each other? Although this is an invalid state for the trace to be in, we still should ensure that the SDK does something reasonable in this case, since the upstream SDK could be broken (or a malicious actor could have tampered with the incoming trace data)

@sentrivana sentrivana linked an issue Mar 26, 2025 that may be closed by this pull request
Copy link
Member

@antonpirker antonpirker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions have been answered off Github and all in all looks good.

The one comment from @szokeasaurusrex about rounding the sample_rand value before storing it trace_state could be addressed, but I guess it should work without it. But maybe better than safe then sorry.

@sentrivana
Copy link
Contributor Author

That should hopefully be it for the functional changes -- adding test cases for the scenarios discussed in the reviews now.

Copy link
Member

@szokeasaurusrex szokeasaurusrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, identified a few more suggestions but looks pretty good

if parent_sample_rand is None:
return None

return _round_sample_rand(parent_sample_rand)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should double-check with @cleptric and/or @jan-auer, but in order to maintain future-compatibility, I don't think we should be rounding the parent_sample_rand.

Suggested change
return _round_sample_rand(parent_sample_rand)
return Decimal(parent_sample_rand)

By rounding it, we potentially modify the DSC (which should be frozen) if the head SDK was rounding differently. This could become a problem if in the future, we were to change the format of the sample_rand value (e.g. by increasing/decreasing precision). If the head SDK had already been updated use the new format, but the user was still using an older Python SDK version, we'd be changing the value back to the old precision here, which we might not want to be doing.

Comment on lines +171 to +175
if sample_rand is not None:
trace_state = trace_state.update(
TRACESTATE_SAMPLE_RAND_KEY, f"{sample_rand:.6f}" # noqa: E231
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] since it seems like we are setting the sample_rand both here and above in dropped_result using the same duplicated code, perhaps we can extract this code into a separate function? If we did this, we could also potentially more other stuff we set to that new function (this might be out of scope for this PR though)

If you think this change would be too much hassle, feel free to ignore this suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_rand = parent_sample_rand
else:
# We are the head SDK and we need to generate a new sample_rand
sample_rand = cast(Decimal, _generate_sample_rand(str(trace_id), (0, 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast is quite confusing.

Could we please try to modify _generate_sample_rand's signature to return Decimal rather than Optional[Decimal]? That would eliminate the need for a cast

Suggested change
sample_rand = cast(Decimal, _generate_sample_rand(str(trace_id), (0, 1)))
sample_rand = _generate_sample_rand(str(trace_id), (0, 1))

Comment on lines 813 to 814
except Exception as ex:
logger.debug(f"Failed to round sample_rand {sample_rand}: {ex}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exception exactly can the above raise? Doesn't setting the context avoid the possibility of an InvalidOperation?

If we can get rid of the try/except, that would allow us to make the signature of this function and _generate_sample_rand a Decimal.

@sentrivana
Copy link
Contributor Author

Copy link
Member

@szokeasaurusrex szokeasaurusrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -293,14 +283,14 @@ def should_sample(
parent_span_context,
attributes,
sample_rate=sample_rate_to_propagate,
sample_rand=sample_rand,
sample_rand=None if sample_rand == parent_sample_rand else sample_rand,
Copy link
Member

@szokeasaurusrex szokeasaurusrex Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] What is the purpose of comparing sample_rand to parent_sample_rand? We might want to clarify this with a code comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sampled_result (and dropped_result as well) will use the sample_rand provided to overwrite it in the trace state. Since we don't want to overwrite incoming parent_sample_rand (since we would potentially change the precision), we only pass it to sampled_result if it's different from parent_sample_rand -- i.e., we generated it ourselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ngl I am still a bit confused. Didn't we get rid of the code that would potentially change the precision?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got rid of the rounding code but not the precision code, we still format sample_rand to 6 decimal places here https://github.com/getsentry/sentry-python/pull/4106/files#diff-59aa7195d955e153b5cdd730f888994996a72eaf5e9ea174335ce961841584a9R169

Copy link
Contributor Author

@sentrivana sentrivana Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, the bigger picture here is just: if we have a halfway valid incoming sample_rand, don't tamper with it, even if it'd mean just adding a few zeroes at the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, fair enough

@sentrivana sentrivana merged commit fc8fa9f into potel-base Mar 27, 2025
106 of 128 checks passed
@sentrivana sentrivana deleted the ivana/potel/port-sample-rand branch March 27, 2025 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port sample-rand to POTel
3 participants