Python/text completion streaming #1115

awharrison-28 · 2023-05-19T20:52:26Z

Motivation and Context

This PR introduces streaming methods to TextCompletionBase and ChatCompletionBase. With this pr, you can stream LLM output in the following ways:

import semantic_kernel as sk
from semantic_kernel.connectors.ai import ChatCompletionClientBase, TextCompletionClientBase, ChatRequestSettings, CompleteRequestSettings
from semantic_kernel.connectors.ai.open_ai import AzureTextCompletion, AzureChatCompletion, OpenAITextCompletion, OpenAIChatCompletion
from semantic_kernel.connectors.ai.hugging_face import HuggingFaceTextCompletion

kernel = sk.Kernel()

# Configure Azure LLM service
deployment, api_key, endpoint = sk.azure_openai_settings_from_dot_env()
text_service = AzureTextCompletion("text-davinci-003", endpoint, api_key)
chat_service = AzureChatCompletion("gpt-35-turbo", endpoint, api_key)

# Configure OpenAI service
api_key, org_id = sk.openai_settings_from_dot_env()
oai_text_service = OpenAITextCompletion("text-davinci-003", api_key, org_id)
oai_chat_service = OpenAIChatCompletion("gpt-3.5-turbo", api_key, org_id)

# Configure Hugging Face service
hf_text_service = HuggingFaceTextCompletion("gpt2", task="text-generation")

request_settings = CompleteRequestSettings(
    max_tokens=1000,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0.5,
    presence_penalty=0.5
)

stream = oai_text_service.complete_stream_async("Write an essay on why AI is awesome:", request_settings)
async for text in stream:
    print(text, end = "") # end = "" to avoid newlines

chat_request_settings = ChatRequestSettings(
    max_tokens=1000,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0.5,
    presence_penalty=0.5,
)

stream = oai_chat_service.complete_chat_stream_async([("user","Write an essay on why AI is awesome:")], chat_request_settings)
async for text in stream:
    print(text, end = "") # end = "" to avoid newlines

request_settings = CompleteRequestSettings(
    max_tokens=256,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0.5,
    presence_penalty=0.5
)

stream = hf_text_service.complete_stream_async("Hi my name is ", request_settings)
async for text in stream:
    print(text, end = "") # end = "" to avoid newlines

Out of Scope: Improving the chat history interface with come in a future PR

Description

added the method complete_stream_async to TextCompletionBase
added the method complete_chat_stream_async to ChatCompletionBase
Updated OpenAI and Hugging Face text completion service classes to support new streaming methods
Added new init.py to make importing service classes

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows SK Contribution Guidelines (https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
The code follows the .NET coding conventions (https://learn.microsoft.com/dotnet/csharp/fundamentals/coding-style/coding-conventions) verified with dotnet format
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

alexchaomander

LGTM!

### Motivation and Context This PR introduces streaming methods to TextCompletionBase and ChatCompletionBase. With this pr, you can stream LLM output in the following ways: ``` import semantic_kernel as sk from semantic_kernel.connectors.ai import ChatCompletionClientBase, TextCompletionClientBase, ChatRequestSettings, CompleteRequestSettings from semantic_kernel.connectors.ai.open_ai import AzureTextCompletion, AzureChatCompletion, OpenAITextCompletion, OpenAIChatCompletion from semantic_kernel.connectors.ai.hugging_face import HuggingFaceTextCompletion kernel = sk.Kernel() # Configure Azure LLM service deployment, api_key, endpoint = sk.azure_openai_settings_from_dot_env() text_service = AzureTextCompletion("text-davinci-003", endpoint, api_key) chat_service = AzureChatCompletion("gpt-35-turbo", endpoint, api_key) # Configure OpenAI service api_key, org_id = sk.openai_settings_from_dot_env() oai_text_service = OpenAITextCompletion("text-davinci-003", api_key, org_id) oai_chat_service = OpenAIChatCompletion("gpt-3.5-turbo", api_key, org_id) # Configure Hugging Face service hf_text_service = HuggingFaceTextCompletion("gpt2", task="text-generation") request_settings = CompleteRequestSettings( max_tokens=1000, temperature=0.7, top_p=1, frequency_penalty=0.5, presence_penalty=0.5 ) stream = oai_text_service.complete_stream_async("Write an essay on why AI is awesome:", request_settings) async for text in stream: print(text, end = "") # end = "" to avoid newlines chat_request_settings = ChatRequestSettings( max_tokens=1000, temperature=0.7, top_p=1, frequency_penalty=0.5, presence_penalty=0.5, ) stream = oai_chat_service.complete_chat_stream_async([("user","Write an essay on why AI is awesome:")], chat_request_settings) async for text in stream: print(text, end = "") # end = "" to avoid newlines request_settings = CompleteRequestSettings( max_tokens=256, temperature=0.7, top_p=1, frequency_penalty=0.5, presence_penalty=0.5 ) stream = hf_text_service.complete_stream_async("Hi my name is ", request_settings) async for text in stream: print(text, end = "") # end = "" to avoid newlines ``` **Out of Scope**: Improving the chat history interface with come in a future PR ### Description - added the method complete_stream_async to TextCompletionBase - added the method complete_chat_stream_async to ChatCompletionBase - Updated OpenAI and Hugging Face text completion service classes to support new streaming methods - Added new __init__.py to make importing service classes

awharrison-28 requested review from dluc, mkarle and alexchaomander May 19, 2023 20:52

github-actions bot added the python Pull requests for the Python Semantic Kernel label May 19, 2023

mkarle approved these changes May 22, 2023

View reviewed changes

alexchaomander approved these changes May 23, 2023

View reviewed changes

awharrison-28 added 6 commits May 23, 2023 11:00

add streaming methods to python text and chat completions

73cacde

streaming implemented for HF models

70ad156

remove straggling TextIoBase

cd131da

update test mocks with new acreate call signature

48724a1

linting

18d7874

ruff checks

80e0e63

shawncal force-pushed the python/text_completion_streaming branch from 3657539 to 80e0e63 Compare May 23, 2023 18:00

shawncal enabled auto-merge (squash) May 23, 2023 18:00

shawncal merged commit cfcb463 into microsoft:main May 23, 2023

joowon-dm-snu mentioned this pull request Jun 20, 2023

Python: Integrate chat stream into kernel #1606

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python/text completion streaming #1115

Python/text completion streaming #1115

awharrison-28 commented May 19, 2023

alexchaomander left a comment

Python/text completion streaming #1115

Python/text completion streaming #1115

Conversation

awharrison-28 commented May 19, 2023

Motivation and Context

Description

Contribution Checklist

alexchaomander left a comment

Choose a reason for hiding this comment