-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(anthropic): add thinking as a separate completion message #2780
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 7b39ca0 in 2 minutes and 44 seconds
More details
- Looked at
6172
lines of code in23
files - Skipped
1
files when reviewing. - Skipped posting
10
drafted comments based on config settings.
1. packages/opentelemetry-instrumentation-anthropic/tests/utils.py:1
- Draft comment:
Good concise test utility. The asserts check required metrics attributes correctly. - Reason this comment was not posted:
Confidence changes required:0%
<= threshold50%
None
2. packages/opentelemetry-instrumentation-anthropic/tests/utils.py:25
- Draft comment:
Consider adding a brief comment to explain the purpose of 'ignore_zero_input_tokens' flag. - Reason this comment was not posted:
Confidence changes required:30%
<= threshold50%
None
3. packages/opentelemetry-instrumentation-anthropic/tests/utils.py:25
- Draft comment:
Consider adding custom error messages to your assert statements for better debugging clarity. For example, when asserting that data_point.sum > 0, a message like 'Expected positive token sum for input tokens but got {data_point.sum}' would help. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
4. packages/opentelemetry-instrumentation-anthropic/tests/utils.py:58
- Draft comment:
All metric data points are forced to have gen_ai.system == 'anthropic'. This assert is concise and correct, but if future extensions support other systems, consider parameterizing this check. - Reason this comment was not posted:
Confidence changes required:30%
<= threshold50%
None
5. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:229
- Draft comment:
Typo in comment: 'Antrhopic' should be corrected to 'Anthropic' for consistency. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_streaming.yaml:267
- Draft comment:
Typo detected: The text segment "getting starte" is missing a 'd'. Please update it to "getting started". - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_streaming.yaml:396
- Draft comment:
Typo detected: The streamed text fragments "a ligh" followed by "thearted introduction" seem to be intended as "a lighthearted introduction". Please check and correct the concatenation. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_message_streaming.yaml:437
- Draft comment:
Typo detected: The contraction "you''" appears to be incorrect and should be "you'd". Additionally, the phrase that is split into "hear any other tech" and then "theme" and "d jokes!" seems intended to be "tech-themed jokes!". Please review the streaming text concatenation for proper word splits. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_tools.yaml:38
- Draft comment:
Minor typographical note: the operating system value 'MacOS' is used, but the conventional spelling is 'macOS'. It's a trivial change for consistency. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py:79
- Draft comment:
Typo: The word 'cassete' should be corrected to 'cassette' in the comment. Please update it consistently across the file. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
While technically correct that "cassete" is misspelled, this is just a typo in comments that doesn't affect functionality. Comments are documentation and should ideally be correct, but this is an extremely minor issue that doesn't impact understanding. The rules say not to make comments that are obvious or unimportant.
The typo does appear consistently throughout the file, so fixing it would improve documentation quality. Comments are part of code maintenance.
However, this is an extremely low-impact issue. The meaning is still clear despite the typo. This kind of nitpick creates noise in PR reviews without adding meaningful value.
While technically correct, this comment about a minor typo in documentation is too trivial to be worth keeping in a PR review.
Workflow ID: wflow_nyEKVvqQmgvvdH2u
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
7b39ca0
to
5c4c55c
Compare
@nirga any thoughts/feedback? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the delay @dinmukhamedm! I looked at it on my phone and I was like "ok I need my laptop for this it's a huge PR" and then forgot about it 😨
looks great, just had a super small nit comment - can you fix?
@@ -52,8 +55,8 @@ def _set_token_usage( | |||
token_histogram: Histogram = None, | |||
choice_counter: Counter = None, | |||
): | |||
cache_read_tokens = complete_response.get("usage", {}).get("cache_read_input_tokens", 0) | |||
cache_creation_tokens = complete_response.get("usage", {}).get("cache_creation_input_tokens", 0) | |||
cache_read_tokens = complete_response.get("usage", {}).get("cache_read_input_tokens", 0) or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you don't need this since get returns 0 by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the thing is that the new anthropic SDK explicitly sets these fields to None
, and so get
returns a None
, which fails below, where we try to add things up.
>>> d = {"key": "val", "none_key": None}
>>> v = d.get("none_key", "fallback")
>>> print (v)
None
>>>
same for all the changes below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh
I don't know if I hate python more or Anthropic more. Thanks!
cache_read_tokens = complete_response.get("usage", {}).get("cache_read_input_tokens", 0) | ||
cache_creation_tokens = complete_response.get("usage", {}).get("cache_creation_input_tokens", 0) | ||
cache_read_tokens = complete_response.get("usage", {}).get("cache_read_input_tokens", 0) or 0 | ||
cache_creation_tokens = complete_response.get("usage", {}).get("cache_creation_input_tokens", 0) or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -159,13 +164,13 @@ def build_from_streaming_response( | |||
completion_tokens = -1 | |||
# prompt_usage | |||
if usage := complete_response.get("usage"): | |||
prompt_tokens = usage.get("input_tokens", 0) | |||
prompt_tokens = usage.get("input_tokens", 0) or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same?
No worries @nirga ! Responded to the comment |
Addresses #2774
This PR introduces many small changes:
gen_ai.completion.0.role = "thinking"
andgen_ai.completion.0.content
for both streaming and non-streaming.anthropic
SDK dependency to support streaming in test. As a result, the following few changes:cache_creation_input_tokens
andcache_read_input_tokens
is explicitly set toNone
. I think this was not the case in older SDK, and this was breaking token usage on both spans and metricscount_tokens
is not available in the SDK itself anymore. There is aanthropic.beta.messages.count_tokens()
now that accepts amessages
array and a model name (starting fromclaude-2
), but it is actually a call to the API – I don't think we need to impose additional latency and networkfeat(instrumentation): ...
orfix(instrumentation): ...
.Important
This PR adds 'thinking' as a separate completion message in the Anthropic integration, updates the SDK, and refactors tests to support these changes.
gen_ai.completion.0.role = "thinking"
in__init__.py
.anthropic
SDK dependency to support streaming.None
values forcache_creation_input_tokens
andcache_read_input_tokens
in__init__.py
.count_tokens
usage due to SDK changes.test_messages.py
,test_prompt_caching.py
, andtest_thinking.py
.test_thinking.py
.__init__.py
.verify_metrics
inutils.py
to handle new metrics.This description was created by
for 7b39ca0. It will automatically update as commits are pushed.