Skip to content

Conversation

@lkk12014402
Copy link
Collaborator

@lkk12014402 lkk12014402 commented Nov 4, 2024

Description

integrate llama stack implementations of the agent memory, refer https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/impls/meta_reference/agents/agents.py#L27

image

@lkk12014402 lkk12014402 marked this pull request as draft November 4, 2024 16:07
@codecov
Copy link

codecov bot commented Nov 4, 2024

Codecov Report

Attention: Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
comps/cores/proto/agents/agents.py 0.00% 8 Missing ⚠️
comps/cores/proto/agents/__init__.py 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
comps/cores/proto/agents/__init__.py 0.00% <0.00%> (ø)
comps/cores/proto/agents/agents.py 0.00% <0.00%> (ø)

@minmin-intel
Copy link
Collaborator

minmin-intel commented Nov 4, 2024

A few thoughts from my side:

  1. Langgraph has "checkpoint" and "store" for short-term and long-term memory, both have in-memory and also SQL db based implementations. We can utilize those langgraph APIs. https://langchain-ai.github.io/langgraph/concepts/persistence/, https://langchain-ai.github.io/langgraph/reference/checkpoints/, https://langchain-ai.github.io/langgraph/reference/store/
  2. What and how to save/retrieve/use memories: depend on different agent designs, but we should provide common functions, for example, save agent_id, thread_id, messages - these will be part of our assistants APIs.
  3. Ultimately (may be not in v1.1 release), the memories will be microservices, and agents send requests to memory DBs. Your diagram captured that, which I agree. Right now, langgraph implemented postgres sql db based checkpointers and stores. Langgraph also has an example showing vector db as memory store. We need to decide which DBs to support and the interface.
  4. We should expose a unified interface to send requests to memory microservices and process the returns from memory microservices, so that from agent developer point of view, they just need to use this unified interface with the memory endpoint url, agent_id, thread_id, and content.

@lkk12014402 lkk12014402 added this to the v1.1 milestone Nov 7, 2024
@lkk12014402 lkk12014402 changed the title draft a demo code for agent memory. agent short & long term memory with langgraph. Nov 10, 2024
@lkk12014402
Copy link
Collaborator Author

lkk12014402 commented Nov 10, 2024

hi, minmin @minmin-intel As our discussion with chendi, I would like to integrate langgraph memory implementations.

langgraph implements the short-term & long-term memory:

see this reference: https://langchain-ai.github.io/langgraph/concepts/persistence/

  1. short-term memory
    with the langgraph checkpointer, the state can be written by a checkpointer to a thread at each graph step, enabling state persistence

  2. long-term memory
    with checkpointers alone, we cannot share information across threads. This motivates the need for the Store interface

And,
for the checkpointer, langgraph has 4 implementations:

  1. libs/checkpoint/langgraph/checkpoint
  2. libs/checkpoint-postgres/langgraph/checkpoint/postgres
  3. libs/checkpoint-sqlite/langgraph/checkpoint/sqlite
  4. libs/checkpoint-duckdb/langgraph/checkpoint/duckdb

for the Store, langgraph has 3 implementations:

  1. libs/checkpoint/langgraph/store
  2. libs/checkpoint-postgres/langgraph/store/postgres
  3. libs/checkpoint-duckdb/langgraph/store/duckdb

I will integrate the libs/checkpoint/langgraph/checkpoint and libs/checkpoint/langgraph/store first for the version 1.1

@lkk12014402
Copy link
Collaborator Author

lkk12014402 commented Nov 10, 2024

test short-term memory

  1. build docker image
docker build -t opea/agent-langchain:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/agent/langchain/Dockerfile .
  1. setup agent with -e with_memory=true
docker run --runtime=runc --name="comps-langchain-agent-endpoint" -v $WORKPATH/comps/agent/langchain/tools:/home/user/comps/agent/langchain/tools -p 9090:9090 --ipc=host -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e model=${LLM_MODEL_ID} -e ip_address=${ip_address} -e strategy=react_llama -e llm_endpoint_url=${llm_endpoint_url} -e llm_engine=tgi -e recursion_limit=5 -e require_human_feedback=false -e tools=/home/user/comps/agent/langchain/tools/custom_tools.yaml -e streaming=false -e with_memory=true opea/agent-langchain:latest
  1. test the agent like the langgraph example

curl http://localhost:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
     "query": "hi! I am bob"
    }'

# with enabling the memory, the agent knows the user name `bob`

# then ask the agent

curl http://localhost:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
     "query": "what is my name?"
    }'

@lkk12014402 lkk12014402 marked this pull request as ready for review November 10, 2024 07:26
@lkk12014402
Copy link
Collaborator Author

lkk12014402 commented Nov 10, 2024

add timeout handling for LLM response, which can return Request timed out.

we can also add node-retry, like this: https://github.com/langchain-ai/langgraph/blob/main/docs/docs/how-tos/node-retries.ipynb

https://github.com/langchain-ai/langgraph/blob/main/docs/docs/how-tos/tool-calling-errors.ipynb

@joshuayao joshuayao added the r1.1 label Nov 11, 2024
@lkk12014402
Copy link
Collaborator Author

need discuss the Improve Assistant api with ‘tool’ keyword

@lkk12014402
Copy link
Collaborator Author

if not set -e HABANA_VISIBLE_DEVICES=all, the ut will fail when start opea/vllm:hpu with RuntimeError: synStatus=8 [Device not found] Device acquire failed. No devices found.

@lvliang-intel lvliang-intel merged commit e39b08f into main Nov 12, 2024
@lvliang-intel lvliang-intel deleted the draft_agent_memory branch November 12, 2024 09:28
@xuechendi
Copy link
Collaborator

if not set -e HABANA_VISIBLE_DEVICES=all, the ut will fail when start opea/vllm:hpu with RuntimeError: synStatus=8 [Device not found] Device acquire failed. No devices found.

This is interesting, is this only happened on specific CI node? or to certain docker runtime version?

parser.add_argument("--custom_prompt", type=str, default=None)
parser.add_argument("--with_memory", type=bool, default=False)
parser.add_argument("--with_store", type=bool, default=False)
parser.add_argument("--timeout", type=int, default=60)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for v1.1 timeout only applies to waiting for LLM response. Can we add timeout for tools in later release?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will comfirm



class PersistenceInfo(BaseModel):
user_id: str = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relationship between user_id and assistant_id?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will comfirm


logger.info("========initiating agent============")
logger.info(f"args: {args}")
agent_inst = instantiate_agent(args, args.strategy, with_memory=args.with_memory)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instantiate_agent when microservice starts makes sense when it is chat_completion, but does not quite make sense when it is assistants api. Shall we initiate agent only when user send a create_assistant request? And even then, we are not materializing the agent, but instead only record the configs (like llama-stack create_agent), the agent is then materialized later when user send request to the thread api (like llama-stack get_agent).

The benefits of such an approach: one microservice can support multiple configs, which means multiple different types of agents, instead of just one config. So this is more scalable.

@joshuayao joshuayao linked an issue Nov 13, 2024 that may be closed by this pull request
madison-evans pushed a commit to SAPD-Intel/GenAIComps that referenced this pull request May 12, 2025
* draft a demo code for memory.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add agent short-term memory with langgraph checkpoint.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add save long-term memory func.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add save long-term memory func.

* add timeout for llm response.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix ut with adding -e HABANA_VISIBLE_DEVICES=all.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent service via assistant apis

7 participants