From 1a58d066f7fe9c35ca18194b07a76d7d6a5485e7 Mon Sep 17 00:00:00 2001
From: Reza Shamji <rezashamji@college.harvard.edu>
Date: Fri, 12 Sep 2025 18:02:27 -0400
Subject: [PATCH] same commit message given these were the only changes I made
 but cleaner PR: much more informative update that is concise but easy for
 participants to quickly read and start using the updates rather than being
 overwhelmed with new tutorial. Also made README.md ordering so a more
 important bullet point was at the top

---
 README.md                                   | 10 +++++++++-
 tutorials/gpt_oss_20b/tutorial_gptoss20b.md |  2 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index fa4752a..f27e8de 100644
--- a/README.md
+++ b/README.md
@@ -8,7 +8,15 @@ A simple inference framework for the CURE-Bench bio-medical AI competition. This
  2025.08.08: **[Question&Answer page](QA.md)**: We have created a Q&A page to share all our responses to questions from participants, ensuring fair competition.
  
  2025.09.10: Added starterkit code and tutorials for running **GPT-OSS-20B**, OpenAI’s 20B open-weight reasoning model.
-
+  Now includes:
+  * **Config-driven options** — control reasoning depth, system identity, developer instructions, and external/internal tools (for Track 2), directly from your config.
+  * **ToolUniverse integration (Track 2)** — easily plug in ToolUniverse tools with GPT-OSS-20B. A schema conversion step is still required, but we’ve provided an example converter in `test_gptoss20b.py` to make this straightforward.
+  * **Structured reasoning trace logging** — Harmony-formatted reasoning steps and final answers are automatically saved in your submissions.
+  * **Subset evaluation (`--subset-size`)** — run/debug on a smaller slice of *any* dataset before full submissions (works with all models, not just GPT-OSS-20B). Example:
+    ```bash
+    python run.py --config metadata_config_test.json --subset-size 20
+    ```
+    
 ## Quick Start
 
 ### Installation Dependencies
diff --git a/tutorials/gpt_oss_20b/tutorial_gptoss20b.md b/tutorials/gpt_oss_20b/tutorial_gptoss20b.md
index c70398f..9cb525a 100644
--- a/tutorials/gpt_oss_20b/tutorial_gptoss20b.md
+++ b/tutorials/gpt_oss_20b/tutorial_gptoss20b.md
@@ -304,7 +304,6 @@ python run.py --model-name myuser/gpt-oss-20b-curebench-ft
 ---
 ## 10. Important Notes
 
-- **Tool interleaving:** Tools are not separate “modes.” GPT-OSS can **mix chain-of-thought, tool calls, and final answers in a single Harmony trace**. This means you may see `analysis → tool call → analysis → final` all in one response.
 - **Instruction hierarchy:** Harmony enforces a strict priority order:  
   **System > Developer > User > Assistant > Tool**.  
   - *System* messages always take precedence (e.g., competition rules).  
@@ -313,6 +312,7 @@ python run.py --model-name myuser/gpt-oss-20b-curebench-ft
   - *Assistant* outputs may include both reasoning (`analysis`) and answers (`final`).  
   - *Tool* calls are lowest priority and always embedded within the reasoning trace.
 - GPT-OSS models *require* the **Harmony format**. If you call `model.generate` directly without Harmony encoding, the model will not behave correctly. The wrapper provided here automatically handles Harmony encoding/decoding for you.
+- **Tool interleaving:** Tools are not separate “modes.” GPT-OSS can **mix chain-of-thought, tool calls, and final answers in a single Harmony trace**. This means you may see `analysis → tool call → analysis → final` all in one response.
 - **Safety note:** Reasoning traces are not filtered; they may hallucinate.  
 - **Tool schemas:** ToolUniverse defines its ~215 biomedical APIs in its own JSON format.  
   Before passing them to GPT-OSS-20B, you must **convert them into OpenAI-style `"type": "function"` schemas**.