Skip to content

LCORE-1216: Bump up to llama-stack 0.4.3#52

Merged
are-ces merged 2 commits intolightspeed-core:mainfrom
are-ces:llama-stack-0.4.x-bumpup
Mar 2, 2026
Merged

LCORE-1216: Bump up to llama-stack 0.4.3#52
are-ces merged 2 commits intolightspeed-core:mainfrom
are-ces:llama-stack-0.4.x-bumpup

Conversation

@are-ces
Copy link
Contributor

@are-ces are-ces commented Feb 8, 2026

Description

This is a significant refactoring of all the modules, mostly because the Agents API has been deprecated in favor of the Responses API in llama-stack (already from 0.3.x).

This upgrade is needed to keep lightspeed-providers on par with LCORE

NOTE: run_moderation has not been designed for redaction but to only block the request, thus lightspeed-redactions will block the message if an unauthorized string is detected, as opposed to run_shield where it is possible to redact the original message.

Changes:

  • Bump up llama-stack library to 0.4.3
  • Refactor agent code to migrate from Agents API to Responses API
  • Refactor safety module run_shield, added run_moderation
  • Kept temperature override, prioritization to latest used tools, tool fitering

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Partially generated by: Claude

Related Tickets & Documents

  • Related Issue # LCORE-1216
  • Closes # LCORE-1216

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

I tested manually via curl requests the following:

  • Question validity run_shield (valid/invalid questions)
  • Question validity run_moderation
  • Redaction run_shield (sensitive data redacted)
  • Redaction run_moderation (message with sensitive data BLOCKED)
  • Tool filtering (11→1 tools)
  • min_tools threshold
  • Previously called tools persistence
  • always_include_tools config
  • Temperature override (1.0 for GPT-5)

@are-ces are-ces marked this pull request as draft February 8, 2026 16:53
@are-ces are-ces force-pushed the llama-stack-0.4.x-bumpup branch 3 times, most recently from b2b25c6 to c84a80e Compare February 8, 2026 17:25
Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say LGTM on my side. But definitely need at least one more reviewer, especially from teams that managed to use provider(s).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are removing the inline::lightspeed_inline_agent we are using in Ansible Lightspeed chatbot, if this PR is merged this will break the chatbot functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline::lightspeed_inline_agent still works, the logic has been moved from agent_instance.py  to agents.py

@are-ces are-ces force-pushed the llama-stack-0.4.x-bumpup branch 3 times, most recently from 218e6d4 to 3ad6905 Compare February 10, 2026 11:29
@TamiTakamiya
Copy link

@are-ces @ldjebran I could run the updated lightspeed_inline_agent with ansible-chatbot-stack The test setup uses:

The setup is somehow complicated because it's using a number of codes that are not merged to main yet. I will create a memo on my test setup.

Note: My setup does not enable MCP server yet. After writing the memo, I plan to test this with MCP server enabled.

@are-ces are-ces force-pushed the llama-stack-0.4.x-bumpup branch from 3ad6905 to f99d3c1 Compare February 11, 2026 08:31
@are-ces are-ces marked this pull request as ready for review February 11, 2026 08:32
Copy link
Contributor

@Jdubrick Jdubrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@are-ces since we only consume the safety shield portion for my use case, that part lgtm, fyi

@ldjebran
Copy link
Contributor

@are-ces seems the file https://github.com/lightspeed-core/lightspeed-providers/blob/main/resources/external_providers/inline/agents/lightspeed_inline_agent.yaml

needs to be updated to:

config_class: lightspeed_stack_providers.providers.inline.agents.lightspeed_inline_agent.config.LightspeedAgentsImplConfig
module: lightspeed_stack_providers.providers.inline.agents.lightspeed_inline_agent
api_dependencies: [ inference, safety, tool_runtime, tool_groups, conversations, prompts ]
optional_api_dependencies: [vector_io, files]

The agent lightspeed_inline_agent is passing through the queries and overriding the temperature when configured , unfortunately I was not able to test mcp filtring as seems the lightspeed-stack has a regression as not passing mcp headers received from client by MCP-HEADERS header.

There is a big work done her, @are-ces many thanks for your efforts,
can we wait a little to merge to see comments of the team about mcp headers ?

Copy link
Contributor

@ldjebran ldjebran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@are-ces many thanks for the work the changes that I proposed in my last comment still valid, tested the mcp but seems the lightspeed_inline_agent is unfortunately not working as expected and breaking when enabling the mcp configuration, I see the mcp returning the list of tools, but the agent seems do not detect that tools and see only 2 instead of more than 300.
this will needs more investigations.

@are-ces are-ces force-pushed the llama-stack-0.4.x-bumpup branch from 84d4bf7 to 622151e Compare February 12, 2026 11:08
@are-ces
Copy link
Contributor Author

are-ces commented Feb 12, 2026

Hey @ldjebran good catch! I have encountered the same problem, I was handling the tools in a wrong way; basically the MCP servers were not being expanded to their tools so we were counting the MCP servers and comparing them with min_tools.
I have tested it on my side and it works as expected, hopefully the same on your side 😄

@are-ces are-ces force-pushed the llama-stack-0.4.x-bumpup branch from 1daeb16 to 88bb4db Compare February 25, 2026 13:18
@TamiTakamiya
Copy link

@are-ces Thanks for the updates. I am trying to verify this PR with lightspeed-core/lightspeed-stack#1179 on my CRC instance. I think I can set up the environment today to run the tests.

@are-ces are-ces force-pushed the llama-stack-0.4.x-bumpup branch 2 times, most recently from 5070e07 to 7e235c2 Compare February 25, 2026 13:57
@are-ces are-ces force-pushed the llama-stack-0.4.x-bumpup branch from 46c2ab8 to 0f82e43 Compare February 25, 2026 15:00
Copy link

@TamiTakamiya TamiTakamiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@are-ces Sorry for letting you wait so long. Though I am still unable to set up my test environment on my CRC, I could successfully test this PR using a test script + newly built ansible-chatbot-stack container image:

INFO     2026-02-26 04:06:00,253 lightspeed_stack_providers.providers.inline.agents.lightspeed_inline_agent.agents:302  
         agents: Previously called tools: set()                                                                         
INFO     2026-02-26 04:06:00,254 lightspeed_stack_providers.providers.inline.agents.lightspeed_inline_agent.agents:158  
         agents: Always included tools (config + previously called): {'knowledge_search'}                               
INFO     2026-02-26 04:06:00,911 lightspeed_stack_providers.providers.inline.agents.lightspeed_inline_agent.agents:354  
         agents: Extracted 127 unique tool definitions from 2 tool configs                                              
INFO     2026-02-26 04:06:00,912 lightspeed_stack_providers.providers.inline.agents.lightspeed_inline_agent.agents:179  
         agents: Tool filtering enabled - filtering 127 tools (threshold: 10)                                           
INFO     2026-02-26 04:06:02,009 lightspeed_stack_providers.providers.inline.agents.lightspeed_inline_agent.agents:237  
         agents: Filtered tool names from LLM: ['job_templates_list']

I approve this PR. Thank you!

@TamiTakamiya
Copy link

@ldjebran @are-ces I could also see the issue at the end of streaming. It occurs on the server side as:

ERROR    2026-03-02 00:05:09,161 uvicorn.error:424 uncategorized: ASGI callable returned without completing response. 

and on the client side as:

aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, mes│[03/01/26 19:05:09] DEBUG    Got event: http.disconnect. Stop streaming.                              sse.py:200
sage='Not enough data to satisfy transfer length header.'> 

I should have looked at that my previous tests, but apparently did not pay much attention. Thank you @ldjebran for bringing this out.

I have tested by changing several conditions and found it occurs:

  1. Only when MCP tool call is made
  2. MCP servers do not have to be our AAP MCP servers. I could recreate the weather sample server
  3. It occurs with the meta-reference agent as well. So it is not an issue of Lightspeed Agent.

Based on those observations, this seemed to be a more general issue of Llama Stack on cleaning MCP sessions at the end of a stream.

Then I searched Llama Stack repo and found code changed added with this PR llamastack/llama-stack#4758 seemed to address the issue. It was included in Llama Stack 0.5.0. However, we need to have the fix with Llama Stack 0.4.3.

So I have ported the fix to lightspeed-providers and created a commit. As far as I tested, it could eliminate the error. Could you try it in your test environment? Thank you.

@ldjebran
Copy link
Contributor

ldjebran commented Mar 2, 2026

@TamiTakamiya tank you for your investigations
I tested the same from your branch with llama-stack fixes, and its working as expected

@are-ces Many thanks for all your efforts to make this working.
we will have to investigate on how to backport this llama-stack fixes.

This PR LGTM

@are-ces are-ces force-pushed the llama-stack-0.4.x-bumpup branch from 19137c1 to 25f15af Compare March 2, 2026 15:48
@are-ces are-ces merged commit 934f3e8 into lightspeed-core:main Mar 2, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants