Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simulator. #2703

Merged
merged 113 commits into from
May 10, 2024
Merged
Show file tree
Hide file tree
Changes from 98 commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
f88b109
Adding simulator templates
nagkumar91 Apr 8, 2024
432d532
Adding synthetic simulator
nagkumar91 Apr 8, 2024
1d5697e
Add tests and dependencies
nagkumar91 Apr 8, 2024
33b6bc8
fix version of aiohttp_retry
nagkumar91 Apr 8, 2024
1db2ebd
Install before tests
nagkumar91 Apr 8, 2024
758de41
Resolve compliance check
nagkumar91 Apr 8, 2024
2ae3b2a
Syntax error on the yml
nagkumar91 Apr 8, 2024
0e9693c
Perform dev setup on unit test setup stage
nagkumar91 Apr 8, 2024
4358014
Change the setup
nagkumar91 Apr 8, 2024
f386152
revert the test file and add install of azure.keyvault
nagkumar91 Apr 8, 2024
0952488
Changing the install again
nagkumar91 Apr 8, 2024
1452643
simulator takes on ml_client as an arg
nagkumar91 Apr 10, 2024
13a0689
Merge branch 'main' into task/addSimulator
nagkumar91 Apr 10, 2024
9ce7b38
Remove reference to ai client
nagkumar91 Apr 10, 2024
d5c6e0b
Remove unnecessary method
nagkumar91 Apr 10, 2024
e39197a
removed unused code
nagkumar91 Apr 11, 2024
8799016
Keyword args for public facing simulaor methods
nagkumar91 Apr 11, 2024
92cdb2c
removed dependency
nagkumar91 Apr 11, 2024
8db2749
Merge branch 'main' into task/addSimulator
nagkumar91 Apr 11, 2024
3160da6
making simulator_templates private
nagkumar91 Apr 11, 2024
922accf
Move tests to right folder
nagkumar91 Apr 12, 2024
9ad30af
updated semaphore limit and load_flow
nagkumar91 Apr 12, 2024
f7288d5
Merge branch 'main' into task/addSimulator
nagkumar91 Apr 13, 2024
89d1aec
Remove pdb
nagkumar91 Apr 15, 2024
b58e70b
Merge branch 'main' into task/addSimulator
nagkumar91 Apr 15, 2024
62ea4c5
Merge branch 'main' into task/addSimulator
nagkumar91 Apr 16, 2024
02334be
Merge branch 'main' into task/addSimulator
nagkumar91 Apr 17, 2024
ab6a3cc
Add e2e test
nagkumar91 Apr 18, 2024
ea5d5f3
More e2e tests
nagkumar91 Apr 18, 2024
5205979
Integrate tests to evals folder
nagkumar91 Apr 18, 2024
da941dd
Merge branch 'main' into task/addSimulator
nagkumar91 Apr 18, 2024
e22ceeb
ignore file in cfg and add unknown word
nagkumar91 Apr 18, 2024
fa8d1f1
Add recording for new test
nagkumar91 Apr 18, 2024
d56b607
Skip e2e test having DefaultAzureCredential
nagkumar91 Apr 18, 2024
3197fed
Fix failing test
nagkumar91 Apr 18, 2024
58c7a12
Update the recordings
nagkumar91 Apr 18, 2024
6447d57
Update recording
nagkumar91 Apr 18, 2024
c87e4d0
Marking test as skipped to mitigate 429s in recording
nagkumar91 Apr 19, 2024
afb10af
Fix cspell issues
nagkumar91 Apr 19, 2024
83c73b7
Merge branch 'main' into task/addSimulator
nagkumar91 Apr 19, 2024
af5b3f0
Fix the formatting of docstring
nagkumar91 Apr 19, 2024
3b41ab4
Update the non-adv simulator to use a class to initialize a userbot c…
nagkumar91 Apr 22, 2024
10606f8
Merge accepting remote changes for data file
nagkumar91 Apr 23, 2024
166b546
Fixed the unittest
nagkumar91 Apr 23, 2024
ab451f5
Fixed e2e tests
nagkumar91 Apr 23, 2024
d99cb39
Skip the failing test
nagkumar91 Apr 23, 2024
746aa0a
Skip test on python 3.9 or lower
nagkumar91 Apr 23, 2024
a676092
Initial rewrite begin
nagkumar91 Apr 26, 2024
d2c2ca0
Raise error when incorrect template is being passed
nagkumar91 Apr 26, 2024
897f840
Add e2e test
nagkumar91 Apr 26, 2024
1f243e4
e2e test with and without rai_svc_url
nagkumar91 Apr 26, 2024
51beacb
State with circular import issue
nagkumar91 Apr 27, 2024
5f606b7
e2e test passing for qa
nagkumar91 Apr 27, 2024
f6ccd8f
Remove old simulator pieces
nagkumar91 Apr 27, 2024
ed24705
Merge branch 'main' into task/addSimulator
nagkumar91 Apr 27, 2024
8675fbc
Merge branch 'main' into task/addSimulator
nagkumar91 Apr 29, 2024
1f50ba9
removed unused test
nagkumar91 Apr 29, 2024
05fbf29
Fix time taken and remove duplicate class
nagkumar91 Apr 29, 2024
4cc7c42
Added unitests for conversation bot
nagkumar91 Apr 29, 2024
7de92b6
Trying a fix for python 3.8 and 3.9
nagkumar91 Apr 29, 2024
7393ba8
Added more tests for callback conversation bot
nagkumar91 Apr 29, 2024
39d1e76
Lock in the identity manager
nagkumar91 Apr 29, 2024
b52c81a
Remove print
nagkumar91 Apr 29, 2024
3770661
Making calls to proxy completion models work better
nagkumar91 Apr 29, 2024
d5292da
remove sending 2 messages with file content
nagkumar91 Apr 29, 2024
bee5c6d
Removed all the old qa things
nagkumar91 Apr 29, 2024
710f48c
Add docstring
nagkumar91 Apr 30, 2024
340ac6d
Adding e2e test for adv_conversation
nagkumar91 Apr 30, 2024
dd58d8d
More tests and add monitoring to adversarial simulator
nagkumar91 May 1, 2024
985a7cc
Merge branch 'main' into task/addSimulator
nagkumar91 May 1, 2024
a5b8324
Update test_adv_simulator.py
nagkumar91 May 1, 2024
6eb9f72
more test changes and remove strict check for template string
nagkumar91 May 2, 2024
0472f91
Fix test and remove unused captioning code
nagkumar91 May 2, 2024
4b428e3
CHange AzureCLICredential to DefaultAzureCredential
nagkumar91 May 2, 2024
292c76c
Add readme
nagkumar91 May 3, 2024
ab40397
Bug bash instrcutions
nagkumar91 May 3, 2024
736b431
Error message for service not available
nagkumar91 May 3, 2024
7aca2ce
Adding context back to follow chat protocol
nagkumar91 May 3, 2024
52738a3
Merge branch 'main' into task/addSimulator
nagkumar91 May 3, 2024
d9854bd
fix conftest formatting
nagkumar91 May 3, 2024
e677c56
Merge branch 'main' into task/addSimulator
nagkumar91 May 6, 2024
638bb07
Remove certain keys from output and add it to tests
nagkumar91 May 6, 2024
48642f3
Remove offensive content in readme
nagkumar91 May 6, 2024
7e0a614
Merge branch 'main' into task/addSimulator
nagkumar91 May 6, 2024
69a746c
Changed the signature according to discussion
nagkumar91 May 7, 2024
71b5b68
Better variable name in readme
nagkumar91 May 7, 2024
ab799e3
Merge branch 'main' into task/addSimulator
nagkumar91 May 7, 2024
11f8de7
Change project_scope to azure_ai_project
nagkumar91 May 7, 2024
23da5a8
update instructions
nagkumar91 May 7, 2024
c59d0ec
update instructions
nagkumar91 May 7, 2024
9fb3caa
update bugbash instructions
nagkumar91 May 7, 2024
22b2524
removed redundant deps
nagkumar91 May 7, 2024
18c0977
Remove retry on 424
nagkumar91 May 7, 2024
7608f72
Removed the references to mlflow_logger as it was not being used
nagkumar91 May 7, 2024
acb660b
Using dataclass
nagkumar91 May 7, 2024
d2c09bd
Merge branch 'main' into task/addSimulator
nagkumar91 May 7, 2024
ae8ca58
use project_name instead of workspace_name
nagkumar91 May 7, 2024
70c2ba5
Update the eval part for bug bash
nagkumar91 May 7, 2024
fa96739
Fix the return type docstring and remove tracking for init
nagkumar91 May 8, 2024
a27b78a
uppercase RAI_SVC_URL
nagkumar91 May 8, 2024
5a3cf52
Merge branch 'main' into task/addSimulator
nagkumar91 May 8, 2024
8e2b5c8
Update import for content safety evaluators
nagkumar91 May 8, 2024
b22b47d
Fix for jailbreak sim
nagkumar91 May 8, 2024
2496b20
Added test to make sure jailbreak works
nagkumar91 May 8, 2024
d7c92ed
Update bugbash instrctions
nagkumar91 May 8, 2024
675bea8
Change to add the enum for adversarial scenarios
nagkumar91 May 9, 2024
ea3b9f4
Fixed e2e tests and readme
nagkumar91 May 9, 2024
35f2e61
Singular scenario
nagkumar91 May 9, 2024
1c9f167
Import fix and jupyter notebook comment
nagkumar91 May 9, 2024
667adb3
Merge branch 'main' into task/addSimulator
nagkumar91 May 9, 2024
5c25178
Skipping the e2e tests
nagkumar91 May 10, 2024
d1e7108
Skipping the e2e tests
nagkumar91 May 10, 2024
0c891c7
Merge branch 'main' into task/addSimulator
nagkumar91 May 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@
".github/workflows/**",
".github/actions/**",
".github/pipelines/**",
".github/CODEOWNERS"
".github/CODEOWNERS",
"src/promptflow-evals/tests/**"
],
"words": [
"aoai",
Expand Down Expand Up @@ -216,10 +217,16 @@
"mpnet",
"wargs",
"dcid",
"aiohttp",
"endofprompt",
"tkey",
"tparam",
"ncols",
"piezo",
"Piezo",
"cmpop",
"omap"
"omap",
"Machinal"
],
"flagWords": [
"Prompt Flow"
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/promptflow-evals-unit-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -104,4 +104,4 @@ jobs:
format: markdown
hide_complexity: true
output: both
thresholds: 40 60
thresholds: 40 60
287 changes: 251 additions & 36 deletions src/promptflow-evals/promptflow/evals/synthetic/README.md

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions src/promptflow-evals/promptflow/evals/synthetic/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .adversarial_simulator import AdversarialSimulator

__all__ = ["AdversarialSimulator"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
# noqa: E402

import copy
import logging
import time
from dataclasses import dataclass
from typing import Any, Dict, List, Optional, Tuple, Union

import jinja2

from .._model_tools import LLMBase, OpenAIChatCompletionsModel, RetryClient
from .constants import ConversationRole


@dataclass
class ConversationTurn:
role: "ConversationRole"
name: Optional[str] = None
message: str = ""
full_response: Optional[Any] = None
request: Optional[Any] = None

def to_openai_chat_format(self, reverse: bool = False) -> dict:
if reverse is False:
return {"role": self.role.value, "content": self.message}
if self.role == ConversationRole.ASSISTANT:
return {"role": ConversationRole.USER.value, "content": self.message}
return {"role": ConversationRole.ASSISTANT.value, "content": self.message}

def to_annotation_format(self, turn_number: int) -> dict:
return {
"turn_number": turn_number,
"response": self.message,
"actor": self.role.value if self.name is None else self.name,
"request": self.request,
"full_json_response": self.full_response,
}

def __str__(self) -> str:
return f"({self.role.value}): {self.message}"


class ConversationBot:
def __init__(
self,
*,
role: ConversationRole,
model: Union[LLMBase, OpenAIChatCompletionsModel],
conversation_template: str,
instantiation_parameters: Dict[str, str],
):
"""
Create a ConversationBot with specific name, persona and a sentence that can be used as a conversation starter.

:param role: The role of the bot in the conversation, either USER or ASSISTANT.
:type role: ConversationRole
:param model: The LLM model to use for generating responses.
:type model: OpenAIChatCompletionsModel
:param conversation_template: A Jinja2 template describing the conversation to generate the prompt for the LLM
:type conversation_template: str
:param instantiation_parameters: A dictionary of parameters used to instantiate the conversation template
:type instantiation_parameters: dict
"""

self.role = role
self.conversation_template_orig = conversation_template
self.conversation_template: jinja2.Template = jinja2.Template(
conversation_template, undefined=jinja2.StrictUndefined
)
self.persona_template_args = instantiation_parameters
if self.role == ConversationRole.USER:
self.name = self.persona_template_args.get("name", role.value)
else:
self.name = self.persona_template_args.get("chatbot_name", role.value) or model.name
self.model = model

self.logger = logging.getLogger(repr(self))
self.conversation_starter = None # can either be a dictionary or jinja template
if role == ConversationRole.USER:
if "conversation_starter" in self.persona_template_args:
conversation_starter_content = self.persona_template_args["conversation_starter"]
if isinstance(conversation_starter_content, dict):
self.conversation_starter = conversation_starter_content
else:
self.conversation_starter = jinja2.Template(
conversation_starter_content, undefined=jinja2.StrictUndefined
)
else:
self.logger.info(
"This simulated bot will generate the first turn as no conversation starter is provided"
)

async def generate_response(
self,
session: RetryClient,
conversation_history: List[ConversationTurn],
max_history: int,
turn_number: int = 0,
) -> Tuple[dict, dict, int, dict]:
"""
Prompt the ConversationBot for a response.

:param session: The aiohttp session to use for the request.
:type session: RetryClient
:param conversation_history: The turns in the conversation so far.
:type conversation_history: List[ConversationTurn]
:param max_history: Parameters used to query GPT-4 model.
:type max_history: int
:param turn_number: Parameters used to query GPT-4 model.
:type turn_number: int
:return: The response from the ConversationBot.
:rtype: Tuple[dict, dict, int, dict]
"""

# check if this is the first turn and the conversation_starter is not None,
# return the conversations starter rather than generating turn using LLM
if turn_number == 0 and self.conversation_starter is not None:
# if conversation_starter is a dictionary, pass it into samples as is
if isinstance(self.conversation_starter, dict):
samples = [self.conversation_starter]
else:
samples = [self.conversation_starter.render(**self.persona_template_args)] # type: ignore[attr-defined]
time_taken = 0
nagkumar91 marked this conversation as resolved.
Show resolved Hide resolved

finish_reason = ["stop"]

parsed_response = {"samples": samples, "finish_reason": finish_reason, "id": None}
full_response = parsed_response
return parsed_response, {}, time_taken, full_response

try:
prompt = self.conversation_template.render(
conversation_turns=conversation_history[-max_history:],
role=self.role.value,
**self.persona_template_args,
)
except Exception: # pylint: disable=broad-except
import code

code.interact(local=locals())

messages = [{"role": "system", "content": prompt}]

# The ChatAPI must respond as ASSISTANT, so if this bot is USER, we need to reverse the messages
if (self.role == ConversationRole.USER) and (isinstance(self.model, (OpenAIChatCompletionsModel))):
# in here we need to simulate the user, The chatapi only generate turn as assistant and
# can't generate turn as user
# thus we reverse all rules in history messages,
# so that messages produced from the other bot passed here as user messages
messages.extend([turn.to_openai_chat_format(reverse=True) for turn in conversation_history[-max_history:]])
prompt_role = ConversationRole.USER.value
else:
messages.extend([turn.to_openai_chat_format() for turn in conversation_history[-max_history:]])
prompt_role = self.role.value

response = await self.model.get_conversation_completion(
messages=messages,
session=session,
role=prompt_role,
)

return response["response"], response["request"], response["time_taken"], response["full_response"]

def __repr__(self):
return f"Bot(name={self.name}, role={self.role.name}, model={self.model.__class__.__name__})"


class CallbackConversationBot(ConversationBot):
def __init__(self, callback, user_template, user_template_parameters, *args, **kwargs):
self.callback = callback
self.user_template = user_template
self.user_template_parameters = user_template_parameters

super().__init__(*args, **kwargs)

async def generate_response(
self,
session: "RetryClient",
conversation_history: List[Any],
max_history: int,
turn_number: int = 0,
) -> Tuple[dict, dict, int, dict]:
chat_protocol_message = self._to_chat_protocol(
self.user_template, conversation_history, self.user_template_parameters
)
msg_copy = copy.deepcopy(chat_protocol_message)
result = {}
start_time = time.time()
result = await self.callback(msg_copy)
end_time = time.time()
if not result:
result = {
"messages": [{"content": "Callback did not return a response.", "role": "assistant"}],
"finish_reason": ["stop"],
"id": None,
"template_parameters": {},
}
self.logger.info("Using user provided callback returning response.")

time_taken = end_time - start_time
try:
response = {
"samples": [result["messages"][-1]["content"]],
"finish_reason": ["stop"],
"id": None,
}
except Exception as exc:
raise TypeError("User provided callback do not conform to chat protocol standard.") from exc

self.logger.info("Parsed callback response")

return response, {}, time_taken, result

def _to_chat_protocol(self, template, conversation_history, template_parameters):
messages = []

for _, m in enumerate(conversation_history):
messages.append({"content": m.message, "role": m.role.value})

return {
"template_parameters": template_parameters,
"messages": messages,
"$schema": "http://azureml/sdk-2-0/ChatConversation.json",
}


__all__ = [
"ConversationRole",
"ConversationBot",
"CallbackConversationBot",
"ConversationTurn",
]
Loading
Loading