Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users/singankit/promptflow eval #2596

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
3919b34
code-first example first version
wangchao1230 Jan 15, 2024
f9fb894
update
wangchao1230 Jan 15, 2024
14ba667
update
wangchao1230 Jan 15, 2024
0f4d5e2
update output
D-W- Jan 16, 2024
7de013c
update requirements
D-W- Jan 26, 2024
3405691
update remote
D-W- Jan 26, 2024
4ca8166
update sample
D-W- Jan 26, 2024
1db6446
snapshot
wangchao1230 Jan 31, 2024
2f53276
snapshot
wangchao1230 Jan 31, 2024
88f9499
update dag entry
D-W- Jan 31, 2024
26401d4
add trace tutorial
wangchao1230 Feb 1, 2024
f83b75a
update
wangchao1230 Feb 1, 2024
9c012a9
resolve comments
wangchao1230 Feb 2, 2024
56946bc
add langchain example
wangchao1230 Feb 4, 2024
067586b
update langchain example
wangchao1230 Feb 4, 2024
e1b3748
add new eval flow example
wangchao1230 Feb 7, 2024
c3db33a
fix comment
wangchao1230 Feb 7, 2024
44fc3e0
User class init for eval
brynn-code Feb 7, 2024
760b183
add readme for trace
lisagreenview Feb 22, 2024
6855482
update trace readme
lisagreenview Feb 26, 2024
7d3f54a
Barebone structure for promptflow eval
singankit Mar 12, 2024
d555893
[promptflow-evals] Built-in Evaluators (#2376)
ninghu Mar 19, 2024
4727e29
Users/singankit/evaluate api first draft (#2416)
singankit Mar 20, 2024
a28c45b
Adding qa evaluator (#2402)
singankit Mar 20, 2024
50bf2e4
Built-in Evaluators - Part 2 (#2437)
ninghu Mar 22, 2024
c4f3d19
Content safety evaluators - make credential parameter optional to su…
ninghu Mar 25, 2024
6c9714e
Convert evaluators from function to class based (#2515)
ninghu Mar 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions examples/flows/chat/chat-basic-code-first/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Basic chat (code-first)
This example shows how to create a basic chat flow using code-first approach. It demonstrates how to create a chatbot that can remember previous interactions and use the conversation history to generate next message.

Tools used in this flow:
- `llm` tool

## Prerequisites

Install promptflow sdk and other dependencies in this folder:
```bash
pip install -r requirements.txt
```

## What you will learn

In this flow, you will learn
- how to compose a chat flow.
- prompt template format of LLM tool chat api. Message delimiter is a separate line containing role name and colon: "system:", "user:", "assistant:".
See <a href="https://platform.openai.com/docs/api-reference/chat/create#chat/create-role" target="_blank">OpenAI Chat</a> for more about message role.
```jinja
system:
You are a chatbot having a conversation with a human.

user:
{{question}}
```
- how to consume chat history in prompt.
```jinja
{% for item in chat_history %}
user:
{{item.inputs.question}}
assistant:
{{item.outputs.answer}}
{% endfor %}
```

## Getting started

### 1 Create connection for LLM tool to use
Go to "Prompt flow" "Connections" tab. Click on "Create" button, select one of LLM tool supported connection types and fill in the configurations.

Currently, there are two connection types supported by LLM tool: "AzureOpenAI" and "OpenAI". If you want to use "AzureOpenAI" connection type, you need to create an Azure OpenAI service first. Please refer to [Azure OpenAI Service](https://azure.microsoft.com/en-us/products/cognitive-services/openai-service/) for more details. If you want to use "OpenAI" connection type, you need to create an OpenAI account first. Please refer to [OpenAI](https://platform.openai.com/) for more details.

```bash
# Override keys with --set to avoid yaml file changes
pf connection create --file ../../../connections/azure_openai.yml --set api_key=<your_api_key> api_base=<your_api_base> --name open_ai_connection
```

Note in [flow.dag.yaml](flow.dag.yaml) we are using connection named `open_ai_connection`.
```bash
# show registered connection
pf connection show --name open_ai_connection
```

### 2 Start chatting

```bash
# run chat flow with default question in flow.dag.yaml
pf flow test --flow .

# run chat flow with new question
pf flow test --flow . --inputs question="What's Azure Machine Learning?"

# start a interactive chat session in CLI
pf flow test --flow . --interactive

# start a interactive chat session in CLI with verbose info
pf flow test --flow . --interactive --verbose
```

12 changes: 12 additions & 0 deletions examples/flows/chat/chat-basic-code-first/chat.jinja2
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
system:
You are a helpful assistant.

{% for item in chat_history %}
user:
{{item.inputs.question}}
assistant:
{{item.outputs.answer}}
{% endfor %}

user:
{{question}}
2 changes: 2 additions & 0 deletions examples/flows/chat/chat-basic-code-first/data.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{"question": "What is Prompt flow?"}
{"question": "What is ChatGPT?"}
2 changes: 2 additions & 0 deletions examples/flows/chat/chat-basic-code-first/flow.dag.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
entry: flow.flow_entry
53 changes: 53 additions & 0 deletions examples/flows/chat/chat-basic-code-first/flow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from dataclasses import dataclass
from jinja2 import Template
from pathlib import Path
from promptflow import trace, PFClient
from promptflow.tools.aoai import chat


BASE_DIR = Path(__file__).absolute().parent


@trace
def load_prompt(jinja2_template: str, question: str, chat_history: list) -> str:
"""Load prompt function."""
with open(BASE_DIR / jinja2_template, "r", encoding="utf-8") as f:
tmpl = Template(f.read(), trim_blocks=True, keep_trailing_newline=True)
prompt = tmpl.render(question=question, chat_history=chat_history)
return prompt


@dataclass
class Result:
answer: str


@trace
def flow_entry(question: str='What is ChatGPT?', chat_history: list = []) -> Result:
"""Flow entry function."""
from promptflow._sdk._configuration import Configuration

prompt = load_prompt("chat.jinja2", question, chat_history)
config = Configuration.get_instance()
# TODO: create your own config.json
workspace_config = config._get_workspace_from_config(path="./config.json")
config.set_config(
Configuration.CONNECTION_PROVIDER,
"azureml:" + workspace_config
)
pf = PFClient()
connection = pf.connections.get("open_ai_connection", with_secrets=True) # TODO: add connection to function inputs
output = chat(
connection=connection,
prompt=prompt,
deployment_name="gpt-35-turbo",
max_tokens=256,
temperature=0.7,
)
# TODO: Result(answer=output)
return dict(answer=output)


if __name__ == "__main__":
result = flow_entry("What's Azure Machine Learning?", [])
print(result)
2 changes: 2 additions & 0 deletions examples/flows/chat/chat-basic-code-first/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
--extra-index-url https://azuremlsdktestpypi.azureedge.net/test-promptflow/
promptflow[azure]==0.0.116642424
3 changes: 3 additions & 0 deletions examples/flows/evaluation/eval-code-quality/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
AZURE_OPENAI_API_KEY=<your_AOAI_key>
AZURE_OPENAI_API_BASE=<your_AOAI_endpoint>
AZURE_OPENAI_API_TYPE=azure
30 changes: 30 additions & 0 deletions examples/flows/evaluation/eval-code-quality/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Eval Code Quality
A example flow shows how to evaluate the quality of code snippet.

## Prerequisites

Install promptflow sdk and other dependencies:
```bash
pip install -r requirements.txt
```

## Run flow

- Prepare your Azure Open AI resource follow this [instruction](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal) and get your `api_key` if you don't have one.

- Setup environment variables

Ensure you have put your azure open ai endpoint key in [.env](.env) file. You can create one refer to this [example file](.env.example).

```bash
cat .env
```

- Test flow/node
```bash
# correct
pf flow test --flow . --inputs code='print(\"Hello, world!\")'

# incorrect
pf flow test --flow . --inputs code='print("Hello, world!")'
```
3 changes: 3 additions & 0 deletions examples/flows/evaluation/eval-code-quality/flow.dag.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
# flow is defined as python function
entry: flow:eval_code
76 changes: 76 additions & 0 deletions examples/flows/evaluation/eval-code-quality/flow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
import json
import os
from dataclasses import dataclass
from pathlib import Path

from dotenv import load_dotenv
from jinja2 import Template

from promptflow import trace
from promptflow._sdk.entities import AzureOpenAIConnection
from promptflow.tools.aoai import AzureOpenAI

BASE_DIR = Path(__file__).absolute().parent


@trace
def load_prompt(jinja2_template: str, code: str, examples: list) -> str:
"""Load prompt function."""
with open(BASE_DIR / jinja2_template, "r", encoding="utf-8") as f:
tmpl = Template(f.read(), trim_blocks=True, keep_trailing_newline=True)
prompt = tmpl.render(code=code, examples=examples)
return prompt


@dataclass
class Result:
correctness: float
readability: float
explanation: str


@trace
def eval_code(code: str) -> Result:
"""Evaluate the code based on correctness, readability, and adherence to best practices."""
examples = [
{
"code": 'print("Hello, world!")',
"correctness": 5,
"readability": 5,
"explanation": "The code is correct as it is a simple question and answer format. "
"The readability is also good as the code is short and easy to understand.",
}
]

prompt = load_prompt("prompt.md", code, examples)

if "AZURE_OPENAI_API_KEY" not in os.environ:
# load environment variables from .env file
load_dotenv()

if "AZURE_OPENAI_API_KEY" not in os.environ:
raise Exception("Please specify environment variables: AZURE_OPENAI_API_KEY")

connection = AzureOpenAIConnection(
api_key=os.environ["AZURE_OPENAI_API_KEY"],
api_base=os.environ["AZURE_OPENAI_API_BASE"],
api_version=os.environ.get("AZURE_OPENAI_API_VERSION", "2023-07-01-preview"),
)

output = AzureOpenAI(connection).chat(
prompt=prompt,
deployment_name="gpt-35-turbo",
max_tokens=256,
temperature=0.7,
)
output = Result(**json.loads(output))
return output


if __name__ == "__main__":
from promptflow._trace._start_trace import start_trace # TODO move to public API

start_trace()

result = eval_code('print("Hello, world!")')
print(result)
20 changes: 20 additions & 0 deletions examples/flows/evaluation/eval-code-quality/prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@

# system:
You are an AI assistant.
You task is to evaluate the code based on correctness, readability.


# user:
This correctness value should always be an integer between 1 and 5. So the correctness produced should be 1 or 2 or 3 or 4 or 5.
This readability value should always be an integer between 1 and 5. So the correctness produced should be 1 or 2 or 3 or 4 or 5.

Here are a few examples:
{% for ex in examples %}
Code: {{ex.code}}
OUTPUT:
{"correctness": "{{ex.correctness}}", "readability": "{{ex.readability}}", "explanation":"{{ex.explanation}}"}
{% endfor %}

For a given code, valuate the code based on correctness, readability:
Code: {{code}}
OUTPUT:
2 changes: 2 additions & 0 deletions examples/flows/evaluation/eval-code-quality/requirement.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
promptflow
promptflow-tools
3 changes: 3 additions & 0 deletions examples/flows/standard/basic-code-first/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
AZURE_OPENAI_API_KEY=<your_AOAI_key>
AZURE_OPENAI_API_BASE=<your_AOAI_endpoint>
AZURE_OPENAI_API_TYPE=azure
Loading
Loading