Skip to content

Conversation

constantinius
Copy link
Contributor

Add a first implementation of the litellm integration, supporting completion and embeddings

Closes https://linear.app/getsentry/issue/PY-1828/add-agent-monitoring-support-for-litellm

@constantinius constantinius requested a review from a team as a code owner September 25, 2025 13:28
Copy link

linear bot commented Sep 25, 2025

cursor[bot]

This comment was marked as outdated.

Comment on lines +275 to +281
litellm.success_callback = litellm.success_callback or []
if _success_callback not in litellm.success_callback:
litellm.success_callback.append(_success_callback)

litellm.failure_callback = litellm.failure_callback or []
if _failure_callback not in litellm.failure_callback:
litellm.failure_callback.append(_failure_callback)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems as if both success_callback and failure_callback are run in a thread, which might finish after completion returns. As the span is closed in either callback, it may occur that the span is finished after the surrounding transaction does, resulting it being absent completely. This should definitely be pointed out somewhere.

Comment on lines +103 to +111
params = {
"model": SPANDATA.GEN_AI_REQUEST_MODEL,
"stream": SPANDATA.GEN_AI_RESPONSE_STREAMING,
"max_tokens": SPANDATA.GEN_AI_REQUEST_MAX_TOKENS,
"presence_penalty": SPANDATA.GEN_AI_REQUEST_PRESENCE_PENALTY,
"frequency_penalty": SPANDATA.GEN_AI_REQUEST_FREQUENCY_PENALTY,
"temperature": SPANDATA.GEN_AI_REQUEST_TEMPERATURE,
"top_p": SPANDATA.GEN_AI_REQUEST_TOP_P,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear where to actually put these parameters in the arguments to completion

kwargs["_sentry_span"] = span

# Set basic data
set_data_normalized(span, SPANDATA.GEN_AI_SYSTEM, "litellm")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be litellm? Or the actual provider being used? (anthropic, openai, ...)

Comment on lines +75 to +84
span = get_start_span_function()(
op=(
consts.OP.GEN_AI_CHAT
if operation == "chat"
else consts.OP.GEN_AI_EMBEDDINGS
),
name=f"{operation} {model}",
origin=LiteLLMIntegration.origin,
)
span.__enter__()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We start a transaction if we don't have one ready yet.

Comment on lines +59 to +126
def _input_callback(
kwargs, # type: Dict[str, Any]
):
# type: (...) -> None
"""Handle the start of a request."""
integration = sentry_sdk.get_client().get_integration(LiteLLMIntegration)

if integration is None:
return

# Get key parameters
model = kwargs.get("model", "")
messages = kwargs.get("messages", [])
operation = "chat" if messages else "embeddings"

# Start a new span/transaction
span = get_start_span_function()(
op=(
consts.OP.GEN_AI_CHAT
if operation == "chat"
else consts.OP.GEN_AI_EMBEDDINGS
),
name=f"{operation} {model}",
origin=LiteLLMIntegration.origin,
)
span.__enter__()

# Store span for later
kwargs["_sentry_span"] = span

# Set basic data
set_data_normalized(span, SPANDATA.GEN_AI_SYSTEM, "litellm")
set_data_normalized(span, SPANDATA.GEN_AI_OPERATION_NAME, operation)
set_data_normalized(
span, "gen_ai.litellm.provider", _get_provider_from_model(model)
)

# Record messages if allowed
if messages and should_send_default_pii() and integration.include_prompts:
set_data_normalized(
span, SPANDATA.GEN_AI_REQUEST_MESSAGES, messages, unpack=False
)

# Record other parameters
params = {
"model": SPANDATA.GEN_AI_REQUEST_MODEL,
"stream": SPANDATA.GEN_AI_RESPONSE_STREAMING,
"max_tokens": SPANDATA.GEN_AI_REQUEST_MAX_TOKENS,
"presence_penalty": SPANDATA.GEN_AI_REQUEST_PRESENCE_PENALTY,
"frequency_penalty": SPANDATA.GEN_AI_REQUEST_FREQUENCY_PENALTY,
"temperature": SPANDATA.GEN_AI_REQUEST_TEMPERATURE,
"top_p": SPANDATA.GEN_AI_REQUEST_TOP_P,
}
for key, attribute in params.items():
value = kwargs.get(key)
if value is not None:
set_data_normalized(span, attribute, value)

# Record LiteLLM-specific parameters
litellm_params = {
"api_base": kwargs.get("api_base"),
"api_version": kwargs.get("api_version"),
"custom_llm_provider": kwargs.get("custom_llm_provider"),
}
for key, value in litellm_params.items():
if value is not None:
set_data_normalized(span, f"gen_ai.litellm.{key}", value)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: The LiteLLM integration callbacks lack capture_internal_exceptions() protection, which could propagate internal SDK errors to the user's application, causing a crash.
  • Description: The LiteLLM integration registers several callbacks, such as _input_callback, _success_callback, and _failure_callback, that execute during a LiteLLM API call. Unlike other Sentry AI integrations, these callbacks are not wrapped in capture_internal_exceptions(). An exception occurring within this code, for example during a call to litellm.get_llm_provider() or while parsing the response object, will not be caught. This will cause the exception to propagate up to the user's application, potentially causing it to crash. This deviates from the established SDK pattern of isolating internal exceptions from user code.

  • Suggested fix: Wrap the entire body of the _input_callback, _success_callback, and _failure_callback functions with the capture_internal_exceptions() context manager. This will ensure any exceptions are caught and logged as internal SDK errors, preventing them from crashing the user's application.
    severity: 0.75, confidence: 0.9

Did we get this right? 👍 / 👎 to inform future reviews.

finally:
# Always finish the span and clean up
span.__exit__(None, None, None)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Race Condition in LiteLLM Callbacks

The LiteLLM integration has a race condition where spans started in _input_callback are finished in separate threads by _success_callback or _failure_callback. These callbacks may complete after the parent transaction, causing spans to be dropped from traces. Additionally, _failure_callback incorrectly calls span.__exit__ with None arguments, failing to properly record the failure state.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant