Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure completion request to maximize Prompt Caching #281

Open
brandonh-msft opened this issue Nov 4, 2024 · 0 comments
Open

Structure completion request to maximize Prompt Caching #281

brandonh-msft opened this issue Nov 4, 2024 · 0 comments

Comments

@brandonh-msft
Copy link

brandonh-msft commented Nov 4, 2024

Describe the feature or improvement you are requesting

Today, the current flow of a request through to an OpenAI service relies on simple JSON-serialization of a model to encode the message to BinaryData and send it through the pipeline.

This does not maximize Prompt Caching capabilities, where the completion request should have tools, then history, then new content - in that order.
Additionally, the tools and history must be in the same order every time (suggest alpha order by tool name).

Sources:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching
https://openai.com/index/api-prompt-caching/
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching#what-is-cached

Asks for BinaryData from the options:

using BinaryContent content = options.ToBinaryContent();

Writes the JSON doc in non-optimal order:

void IJsonModel<ChatCompletionOptions>.Write(Utf8JsonWriter writer, ModelReaderWriterOptions options)

Uses the non-optimal serialization when constructing the BinaryData for the options

internal virtual BinaryContent ToBinaryContent()
{
return BinaryContent.Create(this, ModelSerializationExtensions.WireOptions);
}

Additional context

microsoft/semantic-kernel#9444

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant