fix parsing role in prompt issue (#781)

# Description Issue: ![image](https://github.com/microsoft/promptflow/assets/49483542/a710552f-b624-4719-8670-1a3916c5cf7b) Root cause: Caused by inappropriate regrex expr to match role in prompt. It is unexpected to match role word if it is in content string. For example, should not identify `function` as a role for input `"system:\nthis is my function:\ndef hello"` Solution: Fixed by modifying the regrex expr and added more tests of parsing logic. # All Promptflow Contribution checklist: - [X] **The pull request does not introduce [breaking changes].** - [X] **CHANGELOG is updated for new features, bug fixes or other significant changes.** - [X] **I have read the [contribution guidelines](../CONTRIBUTING.md).** - [ ] **Create an issue and link to the pull request to get dedicated review from promptflow team. Learn more: [suggested workflow](../CONTRIBUTING.md#suggested-workflow).** ## General Guidelines and Best Practices - [X] Title of the pull request is clear and informative. - [X] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, [see this page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md). ### Testing Guidelines - [X] Pull request includes test coverage for the included changes.
microsoft · Oct 16, 2023 · 92ed746 · 92ed746
1 parent 48a4c19
commit 92ed746
Show file tree

Hide file tree

Showing 2 changed files with 24 additions and 3 deletions.
diff --git a/src/promptflow-tools/promptflow/tools/common.py b/src/promptflow-tools/promptflow/tools/common.py
@@ -98,8 +98,10 @@ def parse_function_role_prompt(function_str):
 
 def parse_chat(chat_str):
     # openai chat api only supports below roles.
-    separator = r"(?i)\n*(system|user|assistant|function)\s*:\s*\n"
-    chunks = re.split(separator, chat_str)
+    separator = r"(?i)\n+\s*(system|user|assistant|function)\s*:\s*\n"
+    # Add a newline at the beginning to ensure consistent formatting of role lines.
+    # extra new line is removed when appending to the chat list.
+    chunks = re.split(separator, '\n'+chat_str)
     chat_list = []
     for chunk in chunks:
         last_message = chat_list[-1] if len(chat_list) > 0 else None

diff --git a/src/promptflow-tools/tests/test_common.py b/src/promptflow-tools/tests/test_common.py
@@ -1,7 +1,7 @@
 import pytest
 
 from promptflow.tools.common import parse_function_role_prompt, ChatAPIInvalidFunctions, validate_functions, \
-    process_function_call
+    process_function_call, parse_chat
 
 
 class TestCommon:
@@ -50,3 +50,22 @@ def test_parse_function_role_prompt(self):
         result = parse_function_role_prompt(function_str)
         assert result[0] == "get_location"
         assert result[1] == 'Boston\nabc'
+
+    @pytest.mark.parametrize(
+        "chat_str, expected_result",
+        [
+            ("system:\nthis is my function:\ndef hello", [
+                {'role': 'system', 'content': 'this is my function:\ndef hello'}]),
+            (" \n system:\nthis is my function:\ndef hello", [
+                {'role': 'system', 'content': 'this is my function:\ndef hello'}]),
+            ("user:\nhi\nassistant:\nanswer\nfunction:\nname:\nn\ncontent:\nc", [
+                {'role': 'user', 'content': 'hi'},
+                {'role': 'assistant', 'content': 'answer'},
+                {'role': 'function', 'name': 'n', 'content': 'c'}]),
+            ("\nsystem:\nfirst\n\nsystem:\nsecond", [
+                {'role': 'system', 'content': 'first'}, {'role': 'system', 'content': 'second'}])
+        ]
+    )
+    def test_success_parse_role_prompt(self, chat_str, expected_result):
+        actual_result = parse_chat(chat_str)
+        assert actual_result == expected_result