sweepai · wwzeng1 · May 13, 2024 · May 14, 2024 · May 14, 2024 · May 31, 2024
diff --git a/.gitignore b/.gitignore
@@ -218,4 +218,4 @@ eval/**/*.ipynb
 eval/**/*.json
 # be cautious that txt files are now ignored
 **.txt
-!sweep_chat/.env
+sweep_chat/.env
diff --git a/sweepai/chat/api.py b/sweepai/chat/api.py
@@ -101,7 +101,6 @@ def search_codebase_endpoint(
 Notice that the `query` parameter is a single, extremely detailed, specific natural language search question.
 
 Here are other examples of good questions to ask:
-
 How are GraphQL mutations constructed for updating a user's profile information, and what specific fields are being updated?
 How do the current React components render the product carousel on the homepage, and what library is being used for the carousel functionality?
 How do we currently implement the endpoint handler for processing incoming webhook events from Stripe in the backend API, and how are the events being validated and parsed?
@@ -117,32 +116,34 @@ def search_codebase_endpoint(
 
 ### Format
 
-Use GitHub-styled markdown for your responses. You must respond with the following three distinct sections:
-
-# 1. User Response
+Use GitHub-styled markdown for your responses. You must respond with the following four distinct sections:
 
-<user_response>
+# 1. Summary and analysis
+<analysis>
 ## Summary
 First, list and summarize each NEW file from the codebase provided from the last function output that is relevant to the user's question. You may not need to summarize all provided files.
 
 ## New information
 Secondly, list all new information that was retrieved from the codebase that is relevant to the user's question, especially if it invalidates any previous beliefs or assumptions.
+</analysis>
 
-## Updated answer
+# 2. Updated answer as a user response
+
+<user_response>
 Determine if you have sufficient information to answer the user's question. If not, determine the information you need to answer the question completely by making `search_codebase` tool calls.
 
 If so, rewrite your previous response with the new information and any invalidated beliefs or assumptions. Make sure this answer is complete and helpful. Provide code examples, explanations and excerpts wherever possible to provide concrete explanations. When explaining how to add new code, always write out the new code. When suggesting code changes, write out all the code changes required in the unified diff format.
 </user_response>
 
-# 2. Self-Critique
+# 3. Self-Critique
 
 <self_critique>
 Then, self-critique your answer and validate that you have completely answered the user's question. If the user's answer is relatively broad, you are done.
 
 Otherwise, if the user's question is specific, and asks to implement a feature or fix a bug, determine what additional information you need to answer the user's question. Specifically, validate that all interfaces are being used correctly based on the contents of the retrieved files -- if you cannot verify this, then you must find the relevant information such as the correct interface or schema to validate the usage. If you need to search the codebase for more information, such as for how a particular feature in the codebase works, use the `search_codebase` tool in the next section.
 </self_critique>
 
-# 3. Function Calls (Optional)
+# 4. Function Calls (as needed)
 
 Then, make each function call like so:
 <function_calls>
@@ -153,29 +154,34 @@ def search_codebase_endpoint(
 
 ### Format
 
-Use GitHub-styled markdown for your responses. You must respond with the following three distinct sections:
+Use GitHub-styled markdown for your responses. You must respond with the following four distinct sections:
 
-# 1. User Response
+# 1. Summary and analysis
+<analysis>
+First, list and summarize each file from the codebase provided that is relevant to the user's question. You may not need to summarize all provided files, but only the relevant ones.
+</analysis>
 
-<user_response>
-## Summary
-First, list and summarize each file from the codebase provided that is relevant to the user's question. You may not need to summarize all provided files.
+# 2. User Response
 
-## Answer
+<user_response>
 Determine if you have sufficient information to answer the user's question. If not, determine the information you need to answer the question completely by making `search_codebase` tool calls.
 
-If so, write a complete helpful response to the user's question in full detail. Make sure this answer is complete and helpful. Provide code examples, explanations and excerpts wherever possible to provide concrete explanations. When explaining how to add new code, always write out the new code. When suggesting code changes, write out all the code changes required in the unified diff format.
+If so, write a detailed, helpful response to the user's question. Provide small code examples, explanations and excerpts as required to provide concrete explanations. Break large changes into multiple steps. When suggesting code changes, write out each code change required in the unified diff format, providing a few surrounding lines for context.
 </user_response>
 
-# 2. Self-Critique
+# 3. Self-Critique
 
 <self_critique>
-Then, self-critique your answer and validate that you have completely answered the user's question. If the user's answer is relatively broad, you are done.
+Then, self-critique your answer and validate that you have completely answered the user's question.
 
-Otherwise, if the user's question is specific, and asks to implement a feature or fix a bug, determine what additional information you need to answer the user's question. Specifically, validate that all interfaces are being used correctly based on the contents of the retrieved files -- if you cannot verify this, then you must find the relevant information such as the correct interface or schema to validate the usage. If you need to search the codebase for more information, such as for how a particular feature in the codebase works, use the `search_codebase` tool in the next section.
+Then, determine what additional information you need to answer the user's question. Specifically, validate that:
+1. All interfaces are being used correctly based on the contents of the retrieved files.
+2. When a function is changed, be sure to update all of its usages.
+
+If you cannot verify any of these, then you must find the relevant information such as the correct interface or schema to validate the usage. If you need to search the codebase for more information, such as for how a particular feature in the codebase works, use the `search_codebase` tool in the next section.
 </self_critique>
 
-# 3. Function Calls (Optional)
+# 3. Function Calls (as needed)
 
 Then, make each function call like so:
 <function_calls>
@@ -216,7 +222,7 @@ def search_codebase_endpoint(
 
 """ + example_tool_calls
 
-system_message = """You are a helpful assistant that will answer a user's questions about a codebase to resolve their issue. You are provided with a list of relevant code snippets from the codebase that you can refer to. You can use this information to help the user solve their issue. You may also make function calls to retrieve additional information from the codebase. 
+system_message = """You are a helpful assistant that will answer a user's questions about a codebase to resolve their issue. You are provided with a list of relevant code snippets from the codebase that you can refer to. You can use this information to help the user solve their issue. You may also make function calls to retrieve additional information from the codebase. Your response must be concise and clear. Maximize readability. Provide code examples, explanations, and excerpts wherever possible to provide concrete explanations. Err on the side of providing shorter code examples and explanations.
 
 In this environment, you have access to the following tools to assist in fulfilling the user request:
 
@@ -267,7 +273,7 @@ def chat_codebase(
         raise ValueError("At least one message is required.")
 
     # Stream
-    chat_gpt = ChatGPT.from_system_message_string(
+    chat_gpt: ChatGPT = ChatGPT.from_system_message_string(
         prompt_string=system_message
     )
     snippets_message = relevant_snippets_message.format(
@@ -311,64 +317,65 @@ def stream_state(initial_user_message: str, snippets: list[Snippet], messages: l
         for _ in range(5):
             stream = chat_gpt.chat_anthropic(
                 content=user_message,
-                model="claude-3-opus-20240229",
+                model="gpt-4o",
                 stop_sequences=["</function_call>"],
+                use_openai=True,
                 stream=True
             )
 
             result_string = ""
             user_response = ""
             self_critique = ""
+            current_messages = []
             for token in stream:
                 result_string += token
+                analysis = extract_xml_tag(result_string, "analysis", include_closing_tag=False) or ""
                 user_response = extract_xml_tag(result_string, "user_response", include_closing_tag=False) or ""
                 self_critique = extract_xml_tag(result_string, "self_critique", include_closing_tag=False)
+
+                current_messages = []
 
-                if self_critique:
-                    yield [
-                        *new_messages,
+                if analysis:
+                    current_messages.append(
+                        Message(
+                            content=analysis,
+                            role="function",
+                            function_call={
+                                "function_name": "analysis",
+                                "function_parameters": {},
+                                "is_complete": bool(user_response),
+                            }
+                        )
+                    )
+
+                if user_response:
+                    current_messages.append(
                         Message(
                             content=user_response,
-                            role="assistant"
-                        ),
+                            role="assistant",
+                        )
+                    )
+
+                if self_critique:
+                    current_messages.append(
                         Message(
                             content=self_critique,
                             role="function",
                             function_call={
                                 "function_name": "self_critique",
                                 "function_parameters": {},
-                                "is_complete": False,
                             }
-                        ),
-                    ]
-                else:
-                    yield [
-                        *new_messages,
-                        Message(
-                            content=user_response,
-                            role="assistant"
                         )
-                    ]
-
-            new_messages.append(
-                Message(
-                    content=user_response,
-                    role="assistant",
-                )
-            )
-
-            if self_critique:
-                new_messages.append(
-                    Message(
-                        content=self_critique,
-                        role="function",
-                        function_call={
-                            "function_name": "self_critique",
-                            "function_parameters": {},
-                            "is_complete": True,
-                        }
                     )
-                )
+
+                yield [
+                    *new_messages,
+                    *current_messages
+                ]
+
+            current_messages[-1].function_call["is_complete"] = True
+
+            new_messages.extend(current_messages)
 
             yield new_messages
 

diff --git a/sweepai/core/chat.py b/sweepai/core/chat.py
@@ -55,6 +55,7 @@
     | Literal["gpt-4-0125-preview"]
     | Literal["gpt-4-turbo-2024-04-09"]
     | Literal["gpt-4-turbo"]
+    | Literal["gpt-4o"]
 )
 
 AnthropicModel = (
@@ -71,6 +72,7 @@
     "gpt-4-1106-preview": 128000,
     "gpt-4-0125-preview": 128000,
     "gpt-4-turbo-2024-04-09": 128000,
+    "gpt-4o": 128000,
     "claude-v1": 9000,
     "claude-v1.3-100k": 100000,
     "claude-instant-v1.3-100k": 100000,
@@ -419,7 +421,7 @@ def chat_anthropic(
         if use_openai:
             OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
             assert OPENAI_API_KEY
-            self.model = 'gpt-4-turbo'
+            self.model = 'gpt-4o'
         else:
             ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY")
             assert ANTHROPIC_API_KEY
@@ -439,7 +441,7 @@ def chat_anthropic(
         use_aws = True
         hit_content_filtering = False
         if stream:
-            def llm_stream():
+            def llm_stream_anthropic():
                 client = Anthropic(api_key=ANTHROPIC_API_KEY)
                 start_time = time.time()
                 message_dicts = [
@@ -498,7 +500,65 @@ def llm_stream():
                         logger.exception(e_)
                         raise e_
                 return
-            return llm_stream()
+            def llm_stream_openai():
+                client = OpenAI(api_key=OPENAI_API_KEY)
+                def get_next_token_openai(stream_: Iterator[str], token_queue: queue.Queue):
+                    try:
+                        for i, chunk in enumerate(stream_):
+                            text = chunk.choices[0].delta.content
+                            text = text if text else ""
+                            token_queue.put((i, text))
+                    except Exception as e_:
+                        token_queue.put(e_)
+
+                start_time = time.time()
+                with client.chat.completions.create(
+                    model=model,
+                    messages=self.messages_dicts,
+                    max_tokens=max_tokens,
+                    temperature=temperature,
+                    stop=stop_sequences,
+                    stream=True,
+                ) as stream_:
+                    try:
+                        if verbose:
+                            print(f"Connected to {model}...")
+
+                        token_queue = queue.Queue()
+                        token_thread = threading.Thread(target=get_next_token_openai, args=(stream_, token_queue))
+                        token_thread.daemon = True
+                        token_thread.start()
+
+                        token_timeout = 5  # Timeout threshold in seconds
+
+                        while token_thread.is_alive():
+                            try:
+                                item = token_queue.get(timeout=token_timeout)
+
+                                if item is None:
+                                    break
+
+                                i, text = item
+
+                                if verbose:
+                                    if i == 0:
+                                        print(f"Time to first token: {time.time() - start_time:.2f}s")
+                                    print(text, end="", flush=True)
+
+                                yield text
+
+                            except queue.Empty:
+                                if not token_thread.is_alive():
+                                    break
+                                raise TimeoutError(f"Time between tokens exceeded {token_timeout} seconds.")
+
+                    except TimeoutError as te:
+                        logger.exception(te)
+                        raise te
+                    except Exception as e_:
+                        logger.exception(e_)
+                        raise e_
+            return llm_stream_anthropic() if not use_openai else llm_stream_openai()
         for i in range(NUM_ANTHROPIC_RETRIES):
             try:
                 @file_cache(redis=True, ignore_contents=True) # must be in the inner scope because this entire function manages state