Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Introduced a new condition to yield StreamingChatMessageContent directly when usage data is available. #9753

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ymuichiro
Copy link
Contributor

Motivation and Context

issue: #9751

This pull request addresses a bug where setting stream_options.include_usage to True does not return token usage, resulting in None for the usage field.

The issue occurs when using Azure OpenAI's GPT-4o and GPT-4omini models. In particular, if the last chunk of the response has an empty choices list, the chunk is skipped entirely, and the token usage is not processed correctly.

In the Azure OpenAI implementation, if usage information is included, the chunk should be processed appropriately. However, the current code skips processing when choices is empty. This pull request fixes this behavior so that the chunk is processed when usage is present, even if choices is empty.

Description

This fix includes the following changes:

  • Modified the relevant section in azure_chat_completion.py to ensure that chunks with empty choices are not skipped if usage information is present.
  • Specifically, the condition if len(chunk.choices) == 0: was updated to allow chunks with usage data to be processed correctly.

With these changes, setting stream_options.include_usage to True will correctly return token usage data, even for chunks where the choices list is empty.

Contribution Checklist

@TaoChenOSU
Copy link
Contributor

Hi @ymuichiro, thank you for your contribution!

If you read the comments made to this particular _inner_get_streaming_chat_message_contents method, you will see the reason why stream_option is not allowed with Azure OpenAI.

Did you observe a different behavior with Azure OpenAI?

@yuichiromukaiyama
Copy link
Contributor

@TaoChenOSU
Of course. I have verified this for each API version. In my environment, regardless of which API version I choose, no errors occur, and the token usage for the stream is returned.

Am I misunderstanding something? It does indeed feel odd that it works even with older API versions.

↓ success versions and sample code

2024-10-01-preview
2024-09-01-preview
2024-07-01-preview
2024-10-21
2024-06-01

The following was created directly in the shell to prevent any misunderstandings due to other causes, but even when using Semantic Kernel, the same error could not be reproduced.

payload="{\
  \"messages\": [\
    {\
      \"role\": \"user\",\
      \"content\": [\
        {\
          \"type\": \"text\",\
          \"text\": \"hi.\"\
        }\
      ]\
    }\
  ],\
  \"temperature\": 0.7,\
  \"top_p\": 0.95,\
  \"stream\": true,\
  \"stream_options\": { \"include_usage\": true },\
  \"max_tokens\": 10\
}"

curl "https://${***********************}.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-10-21" \
  -H "Content-Type: application/json" \
  -H "api-key: **************************" \
  -d "$payload"
async def stream_sample() -> None:
    kernel = sk.Kernel()
    service_id: str = "dummy"

    kernel.add_service(
        AzureChatCompletion(
            service_id=service_id,
            deployment_name=AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME,
            endpoint=AZURE_OPENAI_COMPLETION_ENDPOINT,
            api_key=AZURE_OPENAI_COMPLETION_API_KEY,
            api_version="2024-06-01",
        )
    )

    service = kernel.get_service(service_id=service_id)
    settings = service.get_prompt_execution_settings_class()(service_id=service_id)

    if isinstance(settings, AzureChatPromptExecutionSettings):
        settings.extra_body = {
            "stream_options": {
                "include_usage": True,
            }
        }

    history = ChatHistory()
    history.add_user_message("hello")

    async for chunk in service.get_streaming_chat_message_contents(
        chat_history=history,
        settings=settings,
        kernel=kernel,
        arguments=KernelArguments(settings=settings),
    ):
        print(chunk)

@ymuichiro
Copy link
Contributor Author

Sorry, I used the wrong account but it's the same person.

@TaoChenOSU
Copy link
Contributor

Hi @ymuichiro,

I just verified. Seems like they have resolved the issue. Could you remove the override of _inner_get_streaming_chat_message_contents in AzureChatCompletion? The default implementation is already in OpenAIChatCompletionBase which handles streaming tokens correctly.

@TaoChenOSU TaoChenOSU linked an issue Nov 20, 2024 that may be closed by this pull request
…de of _inner_get_streaming_chat_message_contents has been removed.
@ymuichiro
Copy link
Contributor Author

hi @TaoChenOSU

sure, is this ok?
I have confirmed that it works.

076c792

@TaoChenOSU
Copy link
Contributor

hi @TaoChenOSU

sure, is this ok? I have confirmed that it works.

076c792

Yes, this is right. Just one minor comment and we are good!

@TaoChenOSU
Copy link
Contributor

We have a unit test failure because of the change. Could you fix that too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python: Token usage from Azure OpenAI streaming chat completion
4 participants