Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Support Function Calling agent type in Agent framework #3000

Open
xinyual opened this issue Sep 29, 2024 · 8 comments
Open

[RFC] Support Function Calling agent type in Agent framework #3000

xinyual opened this issue Sep 29, 2024 · 8 comments
Labels
enhancement New feature or request untriaged

Comments

@xinyual
Copy link
Collaborator

xinyual commented Sep 29, 2024

Problem statement

In our current agent framework, we support both flow agent and REACT chat agent. Recently, a new chat agent type called function calling has been introduced, offering more powerful capabilities to generate contextually relevant, formatted text output. This output can be used by developers to trigger method calls or API requests, referred to as tools. Function calling is an inherent capability of the LLM itself. Function calling provides a more structured and organized approach, enabling deterministic results from large language models (LLMs) with a reduced error rate. This advanced functionality requires additional fine-tuning in pretrained models, which is now supported by the latest models, such as GPT-3.5/4 and the Claude 3 family.

Proposed Solution

The input and output are well-structured when using function calling with an LLM. Take bedrock's Claude 3 as example (see function calling full API):
Input:

{
    "anthropic_version": "bedrock-2023-05-31",    
    "max_tokens": int,
    "system": string,    
    "messages": [
      ...
    ],
    "tools": [
        {
                "name": string,
                "description": string,
                "input_schema": json
            
        }
    ],
    "stop_sequences": [string]
}

The tools here is a parameter in the API calling instead of part of prompt. Also, the output would be in the following structure:

{
  "type": "message",
   ...
  "content": [
    {
      "type": "text",
      "text": "..."
    },
    {
      "type": "tool_use",
      "name": "..",
      "input": {
       ...
      }
    }
  ],
  "stop_reason": "tool_use/end_turn",
}

If we detect the keyword tool_use, we should execute the tool; Otherwise, we should return the result to the user. This input-output structure and execution logic are not compatible with the current REACT agent. Therefore, we propose adding another agent type to configure this functionality.

{
  "type": "function_calling"
  "llm": {
    }
  },
  "memory": {
    "type": "conversation_index"
  },
  "tools": [
  ]
}

Implementation details

We could still reuse part of code in Class MLChatAgentRunner, like memory. But the logic to parse the response from LLM and extract next step should be changed. Also, logic to format the prompt should also be different since tools now are parameters.

@xinyual xinyual added enhancement New feature or request untriaged labels Sep 29, 2024
@xinyual
Copy link
Collaborator Author

xinyual commented Sep 29, 2024

I can be assignee for this new attribute.

@yuye-aws
Copy link
Member

Can you provide more context? Such as the request to call the agent or the formatted tool description.

@yuye-aws
Copy link
Member

We could still reuse part of code in Class MLChatAgentRunner, like memory. But the logic to parse the response from LLM and extract next step should be changed. Also, logic to format the prompt should also be different since tools now are parameters.

I would prefer a new class other than modifying existing code in MLChatAgentRunner.

@xinyual
Copy link
Collaborator Author

xinyual commented Sep 29, 2024

Can you provide more context? Such as the request to call the agent or the formatted tool description.

To call agent, we still use the same parameters as REACT one. And following is a tool description:

{
            "name": "get_weather",
            "description": "Get the current weather in a given location",
            "input_schema": {
              "type": "object",
              "properties": {
                "location": {
                  "type": "string",
                  "description": "The city and state, e.g. San Francisco, CA"
                }
              },
              "required": ["location"]
            }
          }

@xinyual
Copy link
Collaborator Author

xinyual commented Sep 29, 2024

We could still reuse part of code in Class MLChatAgentRunner, like memory. But the logic to parse the response from LLM and extract next step should be changed. Also, logic to format the prompt should also be different since tools now are parameters.

I would prefer a new class other than modifying existing code in MLChatAgentRunner.

My plan is create two different class like ReactChatAgentRunner/ FunctionCallingAgentRunner, both reuse existing code of MLChatAgentRunner and then implement their own code to parse response and format input.

@yuye-aws
Copy link
Member

yuye-aws commented Sep 29, 2024

To call agent, we still use the same parameters as REACT one. And following is a tool description:

{
            "name": "get_weather",
            "description": "Get the current weather in a given location",
            "input_schema": {
              "type": "object",
              "properties": {
                "location": {
                  "type": "string",
                  "description": "The city and state, e.g. San Francisco, CA"
                }
              },
              "required": ["location"]
            }
          }

Thanks for sharing these. Should user specify parameters like this when creating a tool? Or maybe we parse these parameter from tools?

@yuye-aws
Copy link
Member

My plan is create two different class like ReactChatAgentRunner/ FunctionCallingAgentRunner, both reuse existing code of MLChatAgentRunner and then implement their own code to parse response and format input.

It makes sense. Just wondering if ReactChatAgentRunner will be the original MLChatAgentRunner.

@zane-neo
Copy link
Collaborator

zane-neo commented Oct 8, 2024

This looks good, one question: Did we implement any benchmark on this? The accuracy between agent framework tool selection and function calling in API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request untriaged
Projects
None yet
Development

No branches or pull requests

3 participants