mims-harvard · rezashamji · Sep 12, 2025
diff --git a/README.md b/README.md
@@ -8,7 +8,15 @@ A simple inference framework for the CURE-Bench bio-medical AI competition. This
  2025.08.08: **[Question&Answer page](QA.md)**: We have created a Q&A page to share all our responses to questions from participants, ensuring fair competition.
 
  2025.09.10: Added starterkit code and tutorials for running **GPT-OSS-20B**, OpenAI’s 20B open-weight reasoning model.
-
+  Now includes:
+  * **Config-driven options** — control reasoning depth, system identity, developer instructions, and external/internal tools (for Track 2), directly from your config.
+  * **ToolUniverse integration (Track 2)** — easily plug in ToolUniverse tools with GPT-OSS-20B. A schema conversion step is still required, but we’ve provided an example converter in `test_gptoss20b.py` to make this straightforward.
+  * **Structured reasoning trace logging** — Harmony-formatted reasoning steps and final answers are automatically saved in your submissions.
+  * **Subset evaluation (`--subset-size`)** — run/debug on a smaller slice of *any* dataset before full submissions (works with all models, not just GPT-OSS-20B). Example:
+    ```bash
+    python run.py --config metadata_config_test.json --subset-size 20
+    ```
+
 ## Quick Start
 
 ### Installation Dependencies

diff --git a/tutorials/gpt_oss_20b/tutorial_gptoss20b.md b/tutorials/gpt_oss_20b/tutorial_gptoss20b.md
@@ -304,7 +304,6 @@ python run.py --model-name myuser/gpt-oss-20b-curebench-ft
 ---
 ## 10. Important Notes
 
-- **Tool interleaving:** Tools are not separate “modes.” GPT-OSS can **mix chain-of-thought, tool calls, and final answers in a single Harmony trace**. This means you may see `analysis → tool call → analysis → final` all in one response.
 - **Instruction hierarchy:** Harmony enforces a strict priority order:  
   **System > Developer > User > Assistant > Tool**.  
   - *System* messages always take precedence (e.g., competition rules).  
@@ -313,6 +312,7 @@ python run.py --model-name myuser/gpt-oss-20b-curebench-ft
   - *Assistant* outputs may include both reasoning (`analysis`) and answers (`final`).  
   - *Tool* calls are lowest priority and always embedded within the reasoning trace.
 - GPT-OSS models *require* the **Harmony format**. If you call `model.generate` directly without Harmony encoding, the model will not behave correctly. The wrapper provided here automatically handles Harmony encoding/decoding for you.
+- **Tool interleaving:** Tools are not separate “modes.” GPT-OSS can **mix chain-of-thought, tool calls, and final answers in a single Harmony trace**. This means you may see `analysis → tool call → analysis → final` all in one response.
 - **Safety note:** Reasoning traces are not filtered; they may hallucinate.  
 - **Tool schemas:** ToolUniverse defines its ~215 biomedical APIs in its own JSON format.  
   Before passing them to GPT-OSS-20B, you must **convert them into OpenAI-style `"type": "function"` schemas**.