agent short & long term memory with langgraph. #851

lkk12014402 · 2024-11-04T16:07:30Z

Description

integrate llama stack implementations of the agent memory, refer https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/impls/meta_reference/agents/agents.py#L27

for more information, see https://pre-commit.ci

codecov · 2024-11-04T16:11:08Z

Codecov Report

Attention: Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
comps/cores/proto/agents/agents.py	0.00%	8 Missing ⚠️
comps/cores/proto/agents/__init__.py	0.00%	1 Missing ⚠️

Files with missing lines	Coverage Δ
comps/cores/proto/agents/__init__.py	`0.00% <0.00%> (ø)`
comps/cores/proto/agents/agents.py	`0.00% <0.00%> (ø)`

minmin-intel · 2024-11-04T18:11:26Z

A few thoughts from my side:

Langgraph has "checkpoint" and "store" for short-term and long-term memory, both have in-memory and also SQL db based implementations. We can utilize those langgraph APIs. https://langchain-ai.github.io/langgraph/concepts/persistence/, https://langchain-ai.github.io/langgraph/reference/checkpoints/, https://langchain-ai.github.io/langgraph/reference/store/
What and how to save/retrieve/use memories: depend on different agent designs, but we should provide common functions, for example, save agent_id, thread_id, messages - these will be part of our assistants APIs.
Ultimately (may be not in v1.1 release), the memories will be microservices, and agents send requests to memory DBs. Your diagram captured that, which I agree. Right now, langgraph implemented postgres sql db based checkpointers and stores. Langgraph also has an example showing vector db as memory store. We need to decide which DBs to support and the interface.
We should expose a unified interface to send requests to memory microservices and process the returns from memory microservices, so that from agent developer point of view, they just need to use this unified interface with the memory endpoint url, agent_id, thread_id, and content.

for more information, see https://pre-commit.ci

lkk12014402 · 2024-11-10T07:02:06Z

hi, minmin @minmin-intel As our discussion with chendi, I would like to integrate langgraph memory implementations.

langgraph implements the short-term & long-term memory:

see this reference: https://langchain-ai.github.io/langgraph/concepts/persistence/

short-term memory
with the langgraph checkpointer, the state can be written by a checkpointer to a thread at each graph step, enabling state persistence
long-term memory
with checkpointers alone, we cannot share information across threads. This motivates the need for the Store interface

And,
for the checkpointer, langgraph has 4 implementations:

libs/checkpoint/langgraph/checkpoint
libs/checkpoint-postgres/langgraph/checkpoint/postgres
libs/checkpoint-sqlite/langgraph/checkpoint/sqlite
libs/checkpoint-duckdb/langgraph/checkpoint/duckdb

for the Store, langgraph has 3 implementations:

libs/checkpoint/langgraph/store
libs/checkpoint-postgres/langgraph/store/postgres
libs/checkpoint-duckdb/langgraph/store/duckdb

I will integrate the libs/checkpoint/langgraph/checkpoint and libs/checkpoint/langgraph/store first for the version 1.1

lkk12014402 · 2024-11-10T07:17:03Z

test short-term memory

build docker image

docker build -t opea/agent-langchain:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/agent/langchain/Dockerfile .

setup agent with -e with_memory=true

docker run --runtime=runc --name="comps-langchain-agent-endpoint" -v $WORKPATH/comps/agent/langchain/tools:/home/user/comps/agent/langchain/tools -p 9090:9090 --ipc=host -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e model=${LLM_MODEL_ID} -e ip_address=${ip_address} -e strategy=react_llama -e llm_endpoint_url=${llm_endpoint_url} -e llm_engine=tgi -e recursion_limit=5 -e require_human_feedback=false -e tools=/home/user/comps/agent/langchain/tools/custom_tools.yaml -e streaming=false -e with_memory=true opea/agent-langchain:latest

test the agent like the langgraph example


curl http://localhost:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
     "query": "hi! I am bob"
    }'

# with enabling the memory, the agent knows the user name `bob`

# then ask the agent

curl http://localhost:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
     "query": "what is my name?"
    }'

for more information, see https://pre-commit.ci

lkk12014402 · 2024-11-10T09:08:03Z

add timeout handling for LLM response, which can return Request timed out.

we can also add node-retry, like this: https://github.com/langchain-ai/langgraph/blob/main/docs/docs/how-tos/node-retries.ipynb

https://github.com/langchain-ai/langgraph/blob/main/docs/docs/how-tos/tool-calling-errors.ipynb

for more information, see https://pre-commit.ci

lkk12014402 · 2024-11-11T14:19:31Z

need discuss the Improve Assistant api with ‘tool’ keyword

lkk12014402 · 2024-11-11T15:34:27Z

if not set -e HABANA_VISIBLE_DEVICES=all, the ut will fail when start opea/vllm:hpu with RuntimeError: synStatus=8 [Device not found] Device acquire failed. No devices found.

xuechendi · 2024-11-12T16:23:54Z

if not set -e HABANA_VISIBLE_DEVICES=all, the ut will fail when start opea/vllm:hpu with RuntimeError: synStatus=8 [Device not found] Device acquire failed. No devices found.

This is interesting, is this only happened on specific CI node? or to certain docker runtime version?

minmin-intel · 2024-11-12T17:49:46Z

comps/agent/langchain/src/utils.py

    parser.add_argument("--custom_prompt", type=str, default=None)
+    parser.add_argument("--with_memory", type=bool, default=False)
+    parser.add_argument("--with_store", type=bool, default=False)
+    parser.add_argument("--timeout", type=int, default=60)


So for v1.1 timeout only applies to waiting for LLM response. Can we add timeout for tools in later release?

will comfirm

minmin-intel · 2024-11-12T17:50:29Z

comps/agent/langchain/src/persistence.py

+
+
+class PersistenceInfo(BaseModel):
+    user_id: str = None


What is the relationship between user_id and assistant_id?

will comfirm

minmin-intel · 2024-11-12T18:02:31Z

comps/agent/langchain/agent.py


+logger.info("========initiating agent============")
+logger.info(f"args: {args}")
+agent_inst = instantiate_agent(args, args.strategy, with_memory=args.with_memory)


I think instantiate_agent when microservice starts makes sense when it is chat_completion, but does not quite make sense when it is assistants api. Shall we initiate agent only when user send a create_assistant request? And even then, we are not materializing the agent, but instead only record the configs (like llama-stack create_agent), the agent is then materialized later when user send request to the thread api (like llama-stack get_agent).

The benefits of such an approach: one microservice can support multiple configs, which means multiple different types of agents, instead of just one config. So this is more scalable.

* draft a demo code for memory. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add agent short-term memory with langgraph checkpoint. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add save long-term memory func. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add save long-term memory func. * add timeout for llm response. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ut with adding -e HABANA_VISIBLE_DEVICES=all. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

draft a demo code for memory.

d204e5e

lkk12014402 requested review from minmin-intel and xuechendi November 4, 2024 16:07

lkk12014402 requested a review from lvliang-intel as a code owner November 4, 2024 16:07

lkk12014402 marked this pull request as draft November 4, 2024 16:07

[pre-commit.ci] auto fixes from pre-commit.com hooks

6bdf0b5

for more information, see https://pre-commit.ci

lkk12014402 mentioned this pull request Nov 5, 2024

Agent service via assistant apis #833

Closed

lkk12014402 added this to the v1.1 milestone Nov 7, 2024

lkk12014402 changed the title ~~draft a demo code for agent memory.~~ agent short & long term memory with langgraph. Nov 10, 2024

lkk12014402 and others added 2 commits November 10, 2024 06:56

add agent short-term memory with langgraph checkpoint.

aa1889b

[pre-commit.ci] auto fixes from pre-commit.com hooks

34337d9

for more information, see https://pre-commit.ci

lkk12014402 marked this pull request as ready for review November 10, 2024 07:26

lkk12014402 and others added 4 commits November 10, 2024 15:26

Merge branch 'main' into draft_agent_memory

b4e1e42

add save long-term memory func.

2a0ab26

[pre-commit.ci] auto fixes from pre-commit.com hooks

3b0c07c

for more information, see https://pre-commit.ci

add save long-term memory func.

04efe73

lkk12014402 and others added 2 commits November 10, 2024 09:10

add timeout for llm response.

6ba9367

[pre-commit.ci] auto fixes from pre-commit.com hooks

aa1ef0d

for more information, see https://pre-commit.ci

joshuayao added the r1.1 label Nov 11, 2024

fix ut with adding -e HABANA_VISIBLE_DEVICES=all.

76d73e4

ftian1 approved these changes Nov 12, 2024

View reviewed changes

lvliang-intel approved these changes Nov 12, 2024

View reviewed changes

lvliang-intel merged commit e39b08f into main Nov 12, 2024

lvliang-intel deleted the draft_agent_memory branch November 12, 2024 09:28

minmin-intel reviewed Nov 12, 2024

View reviewed changes

joshuayao linked an issue Nov 13, 2024 that may be closed by this pull request

Agent service via assistant apis #833

Closed

agent short & long term memory with langgraph. #851

agent short & long term memory with langgraph. #851

Uh oh!

Conversation

lkk12014402 commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

codecov bot commented Nov 4, 2024

Codecov Report

Uh oh!

minmin-intel commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lkk12014402 commented Nov 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lkk12014402 commented Nov 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

test short-term memory

Uh oh!

lkk12014402 commented Nov 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lkk12014402 commented Nov 11, 2024

Uh oh!

lkk12014402 commented Nov 11, 2024

Uh oh!

xuechendi commented Nov 12, 2024

Uh oh!

minmin-intel Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

lkk12014402 Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

minmin-intel Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

lkk12014402 Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

minmin-intel Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

lkk12014402 commented Nov 4, 2024 •

edited

Loading

minmin-intel commented Nov 4, 2024 •

edited

Loading

lkk12014402 commented Nov 10, 2024 •

edited

Loading

lkk12014402 commented Nov 10, 2024 •

edited

Loading

lkk12014402 commented Nov 10, 2024 •

edited

Loading