Plugins and Token Count #7200

jamesmkfoo23 · 2024-07-11T06:20:39Z

jamesmkfoo23
Jul 11, 2024

Is my understand correct that the more plugins are loaded to Kernel, the more prompt tokens would be required for planner to decide on the plugin?

Is this very much the same that when that many functions are loaded to available tools (eg: 10 functions) ? or does the prompt tokens only consist of called functions (eg: 2) during the chat completion invocation?

Answered by stephentoub

Jul 11, 2024

Is my understand correct that the more plugins are loaded to Kernel, the more prompt tokens would be required for planner to decide on the plugin?

Yes. Every time you send a request to the service with function calling enabled, a description of each function needs to be included in the request, so the more functions there are and the longer the description of each, the more prompt tokens will be consumed.

This is one of the reasons it's a good idea to try to partition your plugins/functions per use, such that you only send with a given request/prompt the relevant functions. Another important reason is, in general with today's models, you'll get better planning out of them if you limit t…

View full answer

jamesmkfoo23 · 2024-07-11T11:13:43Z

jamesmkfoo23
Jul 11, 2024
Author

I have this Plugin list and triggered 2 conversations:

A) Time - get current time
B) Lights - Turn lights on
C) RAGSearch - Search for answers

Conversation 1:
Query 1) Use time function (A) - prompt_tokens=492, completion_tokens=33, total_tokens=525

    **_[ breakdown]_**
     {'finish_reason':'tool_calls','prompt_tokens': 228, 'completion_tokens': 12,'total_tokens': 240}
     {'finish_reason':'stop','prompt_tokens': 264, 'completion_tokens': 21,'total_tokens': 285}

Query 2) Use RAG Search (B) - prompt_tokens=3013, completion_tokens=215, total_tokens=3228 (Breakdown Q2 + Q1 Total)

    **_[ breakdown]_**
     {'finish_reason':'tool_calls','prompt_tokens': 324, 'completion_tokens': 21,'total_tokens': 345}
     {'finish_reason':'stop','prompt_tokens': 2197, 'completion_tokens': 161,'total_tokens': 2358}

Query 3) Use RAG Search (B) - prompt_tokens=9972, completion_tokens=433, total_tokens=10405 (Breakdown Q3 + Q2 Total)

    **_[ breakdown]_**
     {'finish_reason':'tool_calls','prompt_tokens': 2535, 'completion_tokens': 20,'total_tokens': 2555}
     {'finish_reason':'stop','prompt_tokens': 4424, 'completion_tokens': 198,'total_tokens': 4622}

Conversation 2: Query1) below is exactly the same as Query 3) from Conversation #1
Query 1) Use RAG Search (B) - prompt_tokens=2347, completion_tokens=201, total_tokens=2548

    **_[ breakdown]_**
     {'finish_reason':'tool_calls','prompt_tokens': 228, 'completion_tokens': 21,'total_tokens': 249}
     {'finish_reason':'stop','prompt_tokens': 2119, 'completion_tokens': 180,'total_tokens': 2299}

My observation when comparing Conv 2) Q1 with Conv 1) Q3, from Q3 breakdown it seems to be bloated, which i can only guess due to the retrieval function RAG results from Conv 1) Q2.

While i see the tool_call's response is useful for citations or reference, is there a way to optimize the token count, for example by omitting the previous tool_call's results from the history? I don't think that would be necessary for the subsequent LLM call since the assistant output would be more relevant.

0 replies

stephentoub · 2024-07-11T12:50:48Z

stephentoub
Jul 11, 2024
Collaborator

Is my understand correct that the more plugins are loaded to Kernel, the more prompt tokens would be required for planner to decide on the plugin?

Yes. Every time you send a request to the service with function calling enabled, a description of each function needs to be included in the request, so the more functions there are and the longer the description of each, the more prompt tokens will be consumed.

This is one of the reasons it's a good idea to try to partition your plugins/functions per use, such that you only send with a given request/prompt the relevant functions. Another important reason is, in general with today's models, you'll get better planning out of them if you limit the number of functions offered for it to choose from.

This is one of the reasons Kernel is enabled to be transient in nature, with it being lightweight enough to just create on demand, populate with just the plugins you need, and throw it away after. The AddKernel method for registering a Kernel in the dependency injection container registers it as transient, for example.

2 replies

jamesmkfoo23 Jul 12, 2024
Author

Thank you Stephen

Scar11 Sep 11, 2024

This is such a great answer. Puts words on my vague intuition, and also made me realize how well thought the Kernel is with respect to transience. So cool. The agent is doing wonders for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugins and Token Count #7200

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Plugins and Token Count #7200

jamesmkfoo23 Jul 11, 2024

Replies: 2 comments · 2 replies

jamesmkfoo23 Jul 11, 2024 Author

stephentoub Jul 11, 2024 Collaborator

jamesmkfoo23 Jul 12, 2024 Author

Scar11 Sep 11, 2024

jamesmkfoo23
Jul 11, 2024

Replies: 2 comments 2 replies

jamesmkfoo23
Jul 11, 2024
Author

stephentoub
Jul 11, 2024
Collaborator

jamesmkfoo23 Jul 12, 2024
Author