-
Notifications
You must be signed in to change notification settings - Fork 558
feat(integrations): add litellm integration #4864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
litellm.success_callback = litellm.success_callback or [] | ||
if _success_callback not in litellm.success_callback: | ||
litellm.success_callback.append(_success_callback) | ||
|
||
litellm.failure_callback = litellm.failure_callback or [] | ||
if _failure_callback not in litellm.failure_callback: | ||
litellm.failure_callback.append(_failure_callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems as if both success_callback
and failure_callback
are run in a thread, which might finish after completion
returns. As the span is closed in either callback, it may occur that the span is finished after the surrounding transaction does, resulting it being absent completely. This should definitely be pointed out somewhere.
params = { | ||
"model": SPANDATA.GEN_AI_REQUEST_MODEL, | ||
"stream": SPANDATA.GEN_AI_RESPONSE_STREAMING, | ||
"max_tokens": SPANDATA.GEN_AI_REQUEST_MAX_TOKENS, | ||
"presence_penalty": SPANDATA.GEN_AI_REQUEST_PRESENCE_PENALTY, | ||
"frequency_penalty": SPANDATA.GEN_AI_REQUEST_FREQUENCY_PENALTY, | ||
"temperature": SPANDATA.GEN_AI_REQUEST_TEMPERATURE, | ||
"top_p": SPANDATA.GEN_AI_REQUEST_TOP_P, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear where to actually put these parameters in the arguments to completion
kwargs["_sentry_span"] = span | ||
|
||
# Set basic data | ||
set_data_normalized(span, SPANDATA.GEN_AI_SYSTEM, "litellm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be litellm
? Or the actual provider being used? (anthropic, openai, ...)
span = get_start_span_function()( | ||
op=( | ||
consts.OP.GEN_AI_CHAT | ||
if operation == "chat" | ||
else consts.OP.GEN_AI_EMBEDDINGS | ||
), | ||
name=f"{operation} {model}", | ||
origin=LiteLLMIntegration.origin, | ||
) | ||
span.__enter__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We start a transaction if we don't have one ready yet.
def _input_callback( | ||
kwargs, # type: Dict[str, Any] | ||
): | ||
# type: (...) -> None | ||
"""Handle the start of a request.""" | ||
integration = sentry_sdk.get_client().get_integration(LiteLLMIntegration) | ||
|
||
if integration is None: | ||
return | ||
|
||
# Get key parameters | ||
model = kwargs.get("model", "") | ||
messages = kwargs.get("messages", []) | ||
operation = "chat" if messages else "embeddings" | ||
|
||
# Start a new span/transaction | ||
span = get_start_span_function()( | ||
op=( | ||
consts.OP.GEN_AI_CHAT | ||
if operation == "chat" | ||
else consts.OP.GEN_AI_EMBEDDINGS | ||
), | ||
name=f"{operation} {model}", | ||
origin=LiteLLMIntegration.origin, | ||
) | ||
span.__enter__() | ||
|
||
# Store span for later | ||
kwargs["_sentry_span"] = span | ||
|
||
# Set basic data | ||
set_data_normalized(span, SPANDATA.GEN_AI_SYSTEM, "litellm") | ||
set_data_normalized(span, SPANDATA.GEN_AI_OPERATION_NAME, operation) | ||
set_data_normalized( | ||
span, "gen_ai.litellm.provider", _get_provider_from_model(model) | ||
) | ||
|
||
# Record messages if allowed | ||
if messages and should_send_default_pii() and integration.include_prompts: | ||
set_data_normalized( | ||
span, SPANDATA.GEN_AI_REQUEST_MESSAGES, messages, unpack=False | ||
) | ||
|
||
# Record other parameters | ||
params = { | ||
"model": SPANDATA.GEN_AI_REQUEST_MODEL, | ||
"stream": SPANDATA.GEN_AI_RESPONSE_STREAMING, | ||
"max_tokens": SPANDATA.GEN_AI_REQUEST_MAX_TOKENS, | ||
"presence_penalty": SPANDATA.GEN_AI_REQUEST_PRESENCE_PENALTY, | ||
"frequency_penalty": SPANDATA.GEN_AI_REQUEST_FREQUENCY_PENALTY, | ||
"temperature": SPANDATA.GEN_AI_REQUEST_TEMPERATURE, | ||
"top_p": SPANDATA.GEN_AI_REQUEST_TOP_P, | ||
} | ||
for key, attribute in params.items(): | ||
value = kwargs.get(key) | ||
if value is not None: | ||
set_data_normalized(span, attribute, value) | ||
|
||
# Record LiteLLM-specific parameters | ||
litellm_params = { | ||
"api_base": kwargs.get("api_base"), | ||
"api_version": kwargs.get("api_version"), | ||
"custom_llm_provider": kwargs.get("custom_llm_provider"), | ||
} | ||
for key, value in litellm_params.items(): | ||
if value is not None: | ||
set_data_normalized(span, f"gen_ai.litellm.{key}", value) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug: The LiteLLM integration callbacks lack capture_internal_exceptions()
protection, which could propagate internal SDK errors to the user's application, causing a crash.
-
Description: The LiteLLM integration registers several callbacks, such as
_input_callback
,_success_callback
, and_failure_callback
, that execute during a LiteLLM API call. Unlike other Sentry AI integrations, these callbacks are not wrapped incapture_internal_exceptions()
. An exception occurring within this code, for example during a call tolitellm.get_llm_provider()
or while parsing the response object, will not be caught. This will cause the exception to propagate up to the user's application, potentially causing it to crash. This deviates from the established SDK pattern of isolating internal exceptions from user code. -
Suggested fix: Wrap the entire body of the
_input_callback
,_success_callback
, and_failure_callback
functions with thecapture_internal_exceptions()
context manager. This will ensure any exceptions are caught and logged as internal SDK errors, preventing them from crashing the user's application.
severity: 0.75, confidence: 0.9
Did we get this right? 👍 / 👎 to inform future reviews.
finally: | ||
# Always finish the span and clean up | ||
span.__exit__(None, None, None) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Race Condition in LiteLLM Callbacks
The LiteLLM integration has a race condition where spans started in _input_callback
are finished in separate threads by _success_callback
or _failure_callback
. These callbacks may complete after the parent transaction, causing spans to be dropped from traces. Additionally, _failure_callback
incorrectly calls span.__exit__
with None
arguments, failing to properly record the failure state.
Add a first implementation of the litellm integration, supporting completion and embeddings
Closes https://linear.app/getsentry/issue/PY-1828/add-agent-monitoring-support-for-litellm