Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python/text completion streaming #1115

Merged

Conversation

awharrison-28
Copy link
Contributor

Motivation and Context

This PR introduces streaming methods to TextCompletionBase and ChatCompletionBase. With this pr, you can stream LLM output in the following ways:

import semantic_kernel as sk
from semantic_kernel.connectors.ai import ChatCompletionClientBase, TextCompletionClientBase, ChatRequestSettings, CompleteRequestSettings
from semantic_kernel.connectors.ai.open_ai import AzureTextCompletion, AzureChatCompletion, OpenAITextCompletion, OpenAIChatCompletion
from semantic_kernel.connectors.ai.hugging_face import HuggingFaceTextCompletion

kernel = sk.Kernel()

# Configure Azure LLM service
deployment, api_key, endpoint = sk.azure_openai_settings_from_dot_env()
text_service = AzureTextCompletion("text-davinci-003", endpoint, api_key)
chat_service = AzureChatCompletion("gpt-35-turbo", endpoint, api_key)

# Configure OpenAI service
api_key, org_id = sk.openai_settings_from_dot_env()
oai_text_service = OpenAITextCompletion("text-davinci-003", api_key, org_id)
oai_chat_service = OpenAIChatCompletion("gpt-3.5-turbo", api_key, org_id)

# Configure Hugging Face service
hf_text_service = HuggingFaceTextCompletion("gpt2", task="text-generation")

request_settings = CompleteRequestSettings(
    max_tokens=1000,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0.5,
    presence_penalty=0.5
)

stream = oai_text_service.complete_stream_async("Write an essay on why AI is awesome:", request_settings)
async for text in stream:
    print(text, end = "") # end = "" to avoid newlines

chat_request_settings = ChatRequestSettings(
    max_tokens=1000,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0.5,
    presence_penalty=0.5,
)

stream = oai_chat_service.complete_chat_stream_async([("user","Write an essay on why AI is awesome:")], chat_request_settings)
async for text in stream:
    print(text, end = "") # end = "" to avoid newlines

request_settings = CompleteRequestSettings(
    max_tokens=256,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0.5,
    presence_penalty=0.5
)

stream = hf_text_service.complete_stream_async("Hi my name is ", request_settings)
async for text in stream:
    print(text, end = "") # end = "" to avoid newlines

Out of Scope: Improving the chat history interface with come in a future PR

Description

  • added the method complete_stream_async to TextCompletionBase
  • added the method complete_chat_stream_async to ChatCompletionBase
  • Updated OpenAI and Hugging Face text completion service classes to support new streaming methods
  • Added new init.py to make importing service classes

Contribution Checklist

@github-actions github-actions bot added the python Pull requests for the Python Semantic Kernel label May 19, 2023
Copy link
Contributor

@alexchaomander alexchaomander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shawncal shawncal force-pushed the python/text_completion_streaming branch from 3657539 to 80e0e63 Compare May 23, 2023 18:00
@shawncal shawncal enabled auto-merge (squash) May 23, 2023 18:00
@shawncal shawncal merged commit cfcb463 into microsoft:main May 23, 2023
shawncal pushed a commit to shawncal/semantic-kernel that referenced this pull request Jul 6, 2023
### Motivation and Context
This PR introduces streaming methods to TextCompletionBase and
ChatCompletionBase. With this pr, you can stream LLM output in the
following ways:

```
import semantic_kernel as sk
from semantic_kernel.connectors.ai import ChatCompletionClientBase, TextCompletionClientBase, ChatRequestSettings, CompleteRequestSettings
from semantic_kernel.connectors.ai.open_ai import AzureTextCompletion, AzureChatCompletion, OpenAITextCompletion, OpenAIChatCompletion
from semantic_kernel.connectors.ai.hugging_face import HuggingFaceTextCompletion

kernel = sk.Kernel()

# Configure Azure LLM service
deployment, api_key, endpoint = sk.azure_openai_settings_from_dot_env()
text_service = AzureTextCompletion("text-davinci-003", endpoint, api_key)
chat_service = AzureChatCompletion("gpt-35-turbo", endpoint, api_key)

# Configure OpenAI service
api_key, org_id = sk.openai_settings_from_dot_env()
oai_text_service = OpenAITextCompletion("text-davinci-003", api_key, org_id)
oai_chat_service = OpenAIChatCompletion("gpt-3.5-turbo", api_key, org_id)

# Configure Hugging Face service
hf_text_service = HuggingFaceTextCompletion("gpt2", task="text-generation")

request_settings = CompleteRequestSettings(
    max_tokens=1000,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0.5,
    presence_penalty=0.5
)

stream = oai_text_service.complete_stream_async("Write an essay on why AI is awesome:", request_settings)
async for text in stream:
    print(text, end = "") # end = "" to avoid newlines

chat_request_settings = ChatRequestSettings(
    max_tokens=1000,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0.5,
    presence_penalty=0.5,
)

stream = oai_chat_service.complete_chat_stream_async([("user","Write an essay on why AI is awesome:")], chat_request_settings)
async for text in stream:
    print(text, end = "") # end = "" to avoid newlines

request_settings = CompleteRequestSettings(
    max_tokens=256,
    temperature=0.7,
    top_p=1,
    frequency_penalty=0.5,
    presence_penalty=0.5
)

stream = hf_text_service.complete_stream_async("Hi my name is ", request_settings)
async for text in stream:
    print(text, end = "") # end = "" to avoid newlines
```
**Out of Scope**: Improving the chat history interface with come in a
future PR

### Description
- added the method complete_stream_async to TextCompletionBase
- added the method complete_chat_stream_async to ChatCompletionBase
- Updated OpenAI and Hugging Face text completion service classes to
support new streaming methods
- Added new __init__.py to make importing service classes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants