feat(integrations): add litellm integration #4864

constantinius · 2025-09-25T13:28:31Z

Add a first implementation of the litellm integration, supporting completion and embeddings

Closes https://linear.app/getsentry/issue/PY-1828/add-agent-monitoring-support-for-litellm

linear · 2025-09-25T13:28:39Z

PY-1828 Add agent monitoring support for `litellm`

constantinius · 2025-09-25T13:31:03Z

sentry_sdk/integrations/litellm.py

+        litellm.success_callback = litellm.success_callback or []
+        if _success_callback not in litellm.success_callback:
+            litellm.success_callback.append(_success_callback)
+
+        litellm.failure_callback = litellm.failure_callback or []
+        if _failure_callback not in litellm.failure_callback:
+            litellm.failure_callback.append(_failure_callback)


It seems as if both success_callback and failure_callback are run in a thread, which might finish after completion returns. As the span is closed in either callback, it may occur that the span is finished after the surrounding transaction does, resulting it being absent completely. This should definitely be pointed out somewhere.

constantinius · 2025-09-25T13:31:46Z

sentry_sdk/integrations/litellm.py

+    params = {
+        "model": SPANDATA.GEN_AI_REQUEST_MODEL,
+        "stream": SPANDATA.GEN_AI_RESPONSE_STREAMING,
+        "max_tokens": SPANDATA.GEN_AI_REQUEST_MAX_TOKENS,
+        "presence_penalty": SPANDATA.GEN_AI_REQUEST_PRESENCE_PENALTY,
+        "frequency_penalty": SPANDATA.GEN_AI_REQUEST_FREQUENCY_PENALTY,
+        "temperature": SPANDATA.GEN_AI_REQUEST_TEMPERATURE,
+        "top_p": SPANDATA.GEN_AI_REQUEST_TOP_P,
+    }


It is not clear where to actually put these parameters in the arguments to completion

constantinius · 2025-09-25T13:32:20Z

sentry_sdk/integrations/litellm.py

+    kwargs["_sentry_span"] = span
+
+    # Set basic data
+    set_data_normalized(span, SPANDATA.GEN_AI_SYSTEM, "litellm")


Should this be litellm? Or the actual provider being used? (anthropic, openai, ...)

constantinius · 2025-09-25T13:32:55Z

sentry_sdk/integrations/litellm.py

+    span = get_start_span_function()(
+        op=(
+            consts.OP.GEN_AI_CHAT
+            if operation == "chat"
+            else consts.OP.GEN_AI_EMBEDDINGS
+        ),
+        name=f"{operation} {model}",
+        origin=LiteLLMIntegration.origin,
+    )
+    span.__enter__()


We start a transaction if we don't have one ready yet.

…callback

seer-by-sentry · 2025-09-25T13:34:44Z

sentry_sdk/integrations/litellm.py

+def _input_callback(
+    kwargs,  # type: Dict[str, Any]
+):
+    # type: (...) -> None
+    """Handle the start of a request."""
+    integration = sentry_sdk.get_client().get_integration(LiteLLMIntegration)
+
+    if integration is None:
+        return
+
+    # Get key parameters
+    model = kwargs.get("model", "")
+    messages = kwargs.get("messages", [])
+    operation = "chat" if messages else "embeddings"
+
+    # Start a new span/transaction
+    span = get_start_span_function()(
+        op=(
+            consts.OP.GEN_AI_CHAT
+            if operation == "chat"
+            else consts.OP.GEN_AI_EMBEDDINGS
+        ),
+        name=f"{operation} {model}",
+        origin=LiteLLMIntegration.origin,
+    )
+    span.__enter__()
+
+    # Store span for later
+    kwargs["_sentry_span"] = span
+
+    # Set basic data
+    set_data_normalized(span, SPANDATA.GEN_AI_SYSTEM, "litellm")
+    set_data_normalized(span, SPANDATA.GEN_AI_OPERATION_NAME, operation)
+    set_data_normalized(
+        span, "gen_ai.litellm.provider", _get_provider_from_model(model)
+    )
+
+    # Record messages if allowed
+    if messages and should_send_default_pii() and integration.include_prompts:
+        set_data_normalized(
+            span, SPANDATA.GEN_AI_REQUEST_MESSAGES, messages, unpack=False
+        )
+
+    # Record other parameters
+    params = {
+        "model": SPANDATA.GEN_AI_REQUEST_MODEL,
+        "stream": SPANDATA.GEN_AI_RESPONSE_STREAMING,
+        "max_tokens": SPANDATA.GEN_AI_REQUEST_MAX_TOKENS,
+        "presence_penalty": SPANDATA.GEN_AI_REQUEST_PRESENCE_PENALTY,
+        "frequency_penalty": SPANDATA.GEN_AI_REQUEST_FREQUENCY_PENALTY,
+        "temperature": SPANDATA.GEN_AI_REQUEST_TEMPERATURE,
+        "top_p": SPANDATA.GEN_AI_REQUEST_TOP_P,
+    }
+    for key, attribute in params.items():
+        value = kwargs.get(key)
+        if value is not None:
+            set_data_normalized(span, attribute, value)
+
+    # Record LiteLLM-specific parameters
+    litellm_params = {
+        "api_base": kwargs.get("api_base"),
+        "api_version": kwargs.get("api_version"),
+        "custom_llm_provider": kwargs.get("custom_llm_provider"),
+    }
+    for key, value in litellm_params.items():
+        if value is not None:
+            set_data_normalized(span, f"gen_ai.litellm.{key}", value)
+


Potential bug: The LiteLLM integration callbacks lack capture_internal_exceptions() protection, which could propagate internal SDK errors to the user's application, causing a crash.

Description: The LiteLLM integration registers several callbacks, such as _input_callback, _success_callback, and _failure_callback, that execute during a LiteLLM API call. Unlike other Sentry AI integrations, these callbacks are not wrapped in capture_internal_exceptions(). An exception occurring within this code, for example during a call to litellm.get_llm_provider() or while parsing the response object, will not be caught. This will cause the exception to propagate up to the user's application, potentially causing it to crash. This deviates from the established SDK pattern of isolating internal exceptions from user code.

Suggested fix: Wrap the entire body of the _input_callback, _success_callback, and _failure_callback functions with the capture_internal_exceptions() context manager. This will ensure any exceptions are caught and logged as internal SDK errors, preventing them from crashing the user's application.
_{severity: 0.75, confidence: 0.9}

_{Did we get this right? 👍 / 👎 to inform future reviews.}

cursor · 2025-09-25T13:37:19Z

sentry_sdk/integrations/litellm.py

+    finally:
+        # Always finish the span and clean up
+        span.__exit__(None, None, None)
+


Bug: Race Condition in LiteLLM Callbacks

The LiteLLM integration has a race condition where spans started in _input_callback are finished in separate threads by _success_callback or _failure_callback. These callbacks may complete after the parent transaction, causing spans to be dropped from traces. Additionally, _failure_callback incorrectly calls span.__exit__ with None arguments, failing to properly record the failure state.

feat(integrations): add litellm integration

09c5ca7

constantinius requested a review from a team as a code owner September 25, 2025 13:28

This comment was marked as outdated.

Sign in to view

constantinius commented Sep 25, 2025

View reviewed changes

fix(integrations): early redurn when we don't have a span in failure_…

1f85e01

…callback

seer-by-sentry bot reviewed Sep 25, 2025

View reviewed changes

cursor bot reviewed Sep 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(integrations): add litellm integration #4864

feat(integrations): add litellm integration #4864

Uh oh!

constantinius commented Sep 25, 2025

Uh oh!

linear bot commented Sep 25, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

constantinius Sep 25, 2025

Uh oh!

constantinius Sep 25, 2025

Uh oh!

constantinius Sep 25, 2025

Uh oh!

constantinius Sep 25, 2025

Uh oh!

seer-by-sentry bot Sep 25, 2025

Uh oh!

cursor bot Sep 25, 2025

Uh oh!

Uh oh!

feat(integrations): add litellm integration #4864

Are you sure you want to change the base?

feat(integrations): add litellm integration #4864

Uh oh!

Conversation

constantinius commented Sep 25, 2025

Uh oh!

linear bot commented Sep 25, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

constantinius Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

constantinius Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

constantinius Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

constantinius Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

seer-by-sentry bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot Sep 25, 2025

Choose a reason for hiding this comment

Bug: Race Condition in LiteLLM Callbacks

Uh oh!

Uh oh!