feat: add optional verbose metadata to /v1/infer endpoint by Lifto · Pull Request #1305 · lightspeed-core/lightspeed-stack

Lifto · 2026-03-10T20:55:12Z

Summary

Add development/testing feature to return extended debugging metadata from /v1/infer endpoint, similar to /v1/query. Requires dual opt-in: config flag (allow_verbose_infer) and request parameter (include_metadata).

When enabled, returns:

tool_calls - MCP tool calls made during inference
tool_results - Results from MCP tool calls
rag_chunks - RAG chunks retrieved from documentation
referenced_documents - Source documents referenced
input_tokens / output_tokens - Token usage

Maintains backwards compatibility by excluding null fields from standard responses.

Changes

models/config.py - Added allow_verbose_infer config flag with RBAC implementation notes
models/rlsapi/requests.py - Added include_metadata request field
models/rlsapi/responses.py - Extended RlsapiV1InferData with optional metadata fields
app/endpoints/rlsapi_v1.py - Added conditional logic and response_model_exclude_none=True

Test Plan

Normal request returns only text and request_id
Verbose request with config disabled returns standard response
Verbose request with config enabled returns all metadata fields
Backwards compatibility maintained (no null fields in standard responses)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added optional verbose metadata support for inference requests. When enabled, responses now include detailed information about tool calls, tool results, RAG chunks, referenced documents, and token usage metrics.

Add development/testing feature to return extended debugging metadata from /v1/infer endpoint, similar to /v1/query. Requires dual opt-in: config flag (allow_verbose_infer) and request parameter (include_metadata). When enabled, returns tool_calls, tool_results, rag_chunks, referenced_documents, and token counts. Maintains backwards compatibility by excluding null fields from standard responses. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai · 2026-03-10T20:55:32Z

Walkthrough

This PR adds verbose metadata support to the /infer endpoint, allowing clients to request and receive detailed metadata including tool calls, tool results, RAG chunks, referenced documents, and token usage. A new configuration flag allow_verbose_infer and request field include_metadata control this behavior.

Changes

Cohort / File(s)	Summary
Verbose Inference Configuration `src/models/config.py`	Added `allow_verbose_infer: bool = False` field to Customization class to control verbose metadata availability at the server level.
API Contract Updates `src/models/rlsapi/requests.py`, `src/models/rlsapi/responses.py`	Added `include_metadata: bool` field to RlsapiV1InferRequest and six new optional metadata fields (tool_calls, tool_results, rag_chunks, referenced_documents, input_tokens, output_tokens) to RlsapiV1InferData. Expanded imports to include necessary response types.
Endpoint Implementation `src/app/endpoints/rlsapi_v1.py`	Implemented conditional verbose mode logic that fetches full response objects via LlamaStack client, extracts metadata fields, and computes turn_summary when verbose is enabled. Updated router decorator to include `response_model_exclude_none=True` for null field serialization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding optional verbose metadata to the /v1/infer endpoint, which is the primary focus of all modifications across four files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

src/app/endpoints/rlsapi_v1.py (1)

464-483: Consider extracting shared response creation logic.

The verbose path duplicates the client.responses.create() call that exists in retrieve_simple_response(). Consider refactoring to avoid duplication.

♻️ Proposed refactor to reduce duplication

 async def retrieve_simple_response(
     question: str,
     instructions: str,
     tools: Optional[list[Any]] = None,
     model_id: Optional[str] = None,
-) -> str:
+    return_full_response: bool = False,
+) -> str | OpenAIResponseObject:
     """Retrieve a simple response from the LLM for a stateless query.
-
-    Uses the Responses API for simple stateless inference, consistent with
-    other endpoints (query, streaming_query).
-
-    Args:
-        question: The combined user input (question + context).
-        instructions: System instructions for the LLM.
-        tools: Optional list of MCP tool definitions for the LLM.
-        model_id: Fully qualified model identifier in provider/model format.
-            When omitted, the configured default model is used.
-
-    Returns:
-        The LLM-generated response text.
-
-    Raises:
-        APIConnectionError: If the Llama Stack service is unreachable.
-        HTTPException: 503 if no default model is configured.
+    ...
+    Args:
+        ...
+        return_full_response: If True, return the full OpenAIResponseObject.
+
+    Returns:
+        The LLM-generated response text, or full response object if requested.
     """
     client = AsyncLlamaStackClientHolder().get_client()
     resolved_model_id = model_id or await _get_default_model_id()
     logger.debug("Using model %s for rlsapi v1 inference", resolved_model_id)

     response = await client.responses.create(
         input=question,
         model=resolved_model_id,
         instructions=instructions,
         tools=tools or [],
         stream=False,
         store=False,
     )
     response = cast(OpenAIResponseObject, response)
-    extract_token_usage(response.usage, resolved_model_id)
 
-    return extract_text_from_response_items(response.output)
+    if return_full_response:
+        return response
+
+    extract_token_usage(response.usage, resolved_model_id)
+    return extract_text_from_response_items(response.output)

Then in infer_endpoint:

-        if verbose_enabled:
-            client = AsyncLlamaStackClientHolder().get_client()
-            response = await client.responses.create(
-                input=input_source,
-                model=model_id,
-                instructions=instructions,
-                tools=mcp_tools or [],
-                stream=False,
-                store=False,
-            )
-            response = cast(OpenAIResponseObject, response)
-            response_text = extract_text_from_response_items(response.output)
-        else:
-            response = None
-            response_text = await retrieve_simple_response(...)
+        if verbose_enabled:
+            response = await retrieve_simple_response(
+                input_source, instructions, tools=mcp_tools,
+                model_id=model_id, return_full_response=True
+            )
+            response = cast(OpenAIResponseObject, response)
+            response_text = extract_text_from_response_items(response.output)
+        else:
+            response = None
+            response_text = await retrieve_simple_response(
+                input_source, instructions, tools=mcp_tools, model_id=model_id
+            )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 464 - 483, The verbose and
non-verbose branches duplicate the client.responses.create call; refactor by
extracting the shared response-creation logic into a single helper (for example
a new function used by infer_endpoint and retrieve_simple_response) that calls
AsyncLlamaStackClientHolder().get_client().responses.create with parameters
(input_source, model_id, instructions, tools, stream, store) and returns the
response object/text; then have infer_endpoint call that helper (or have
retrieve_simple_response delegate to it) and remove the duplicated
client.responses.create usage in the verbose branch so both paths share the same
implementation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 464-483: The verbose and non-verbose branches duplicate the
client.responses.create call; refactor by extracting the shared
response-creation logic into a single helper (for example a new function used by
infer_endpoint and retrieve_simple_response) that calls
AsyncLlamaStackClientHolder().get_client().responses.create with parameters
(input_source, model_id, instructions, tools, stream, store) and returns the
response object/text; then have infer_endpoint call that helper (or have
retrieve_simple_response delegate to it) and remove the duplicated
client.responses.create usage in the verbose branch so both paths share the same
implementation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 534d060a-0ac8-4214-8bda-9ec1b42ff84d

📥 Commits

Reviewing files that changed from the base of the PR and between de8a85a and 4c6b955.

📒 Files selected for processing (4)

src/app/endpoints/rlsapi_v1.py
src/models/config.py
src/models/rlsapi/requests.py
src/models/rlsapi/responses.py

Lifto marked this pull request as draft March 10, 2026 20:55

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

Lifto marked this pull request as ready for review March 11, 2026 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add optional verbose metadata to /v1/infer endpoint#1305

feat: add optional verbose metadata to /v1/infer endpoint#1305
Lifto wants to merge 1 commit intolightspeed-core:mainfrom
Lifto:feat/verbose-infer-metadata

Lifto commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Lifto commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Lifto commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading