diff --git a/README.md b/README.md index fa4752a..f27e8de 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,15 @@ A simple inference framework for the CURE-Bench bio-medical AI competition. This 2025.08.08: **[Question&Answer page](QA.md)**: We have created a Q&A page to share all our responses to questions from participants, ensuring fair competition. 2025.09.10: Added starterkit code and tutorials for running **GPT-OSS-20B**, OpenAI’s 20B open-weight reasoning model. - + Now includes: + * **Config-driven options** — control reasoning depth, system identity, developer instructions, and external/internal tools (for Track 2), directly from your config. + * **ToolUniverse integration (Track 2)** — easily plug in ToolUniverse tools with GPT-OSS-20B. A schema conversion step is still required, but we’ve provided an example converter in `test_gptoss20b.py` to make this straightforward. + * **Structured reasoning trace logging** — Harmony-formatted reasoning steps and final answers are automatically saved in your submissions. + * **Subset evaluation (`--subset-size`)** — run/debug on a smaller slice of *any* dataset before full submissions (works with all models, not just GPT-OSS-20B). Example: + ```bash + python run.py --config metadata_config_test.json --subset-size 20 + ``` + ## Quick Start ### Installation Dependencies diff --git a/tutorials/gpt_oss_20b/tutorial_gptoss20b.md b/tutorials/gpt_oss_20b/tutorial_gptoss20b.md index c70398f..9cb525a 100644 --- a/tutorials/gpt_oss_20b/tutorial_gptoss20b.md +++ b/tutorials/gpt_oss_20b/tutorial_gptoss20b.md @@ -304,7 +304,6 @@ python run.py --model-name myuser/gpt-oss-20b-curebench-ft --- ## 10. Important Notes -- **Tool interleaving:** Tools are not separate “modes.” GPT-OSS can **mix chain-of-thought, tool calls, and final answers in a single Harmony trace**. This means you may see `analysis → tool call → analysis → final` all in one response. - **Instruction hierarchy:** Harmony enforces a strict priority order: **System > Developer > User > Assistant > Tool**. - *System* messages always take precedence (e.g., competition rules). @@ -313,6 +312,7 @@ python run.py --model-name myuser/gpt-oss-20b-curebench-ft - *Assistant* outputs may include both reasoning (`analysis`) and answers (`final`). - *Tool* calls are lowest priority and always embedded within the reasoning trace. - GPT-OSS models *require* the **Harmony format**. If you call `model.generate` directly without Harmony encoding, the model will not behave correctly. The wrapper provided here automatically handles Harmony encoding/decoding for you. +- **Tool interleaving:** Tools are not separate “modes.” GPT-OSS can **mix chain-of-thought, tool calls, and final answers in a single Harmony trace**. This means you may see `analysis → tool call → analysis → final` all in one response. - **Safety note:** Reasoning traces are not filtered; they may hallucinate. - **Tool schemas:** ToolUniverse defines its ~215 biomedical APIs in its own JSON format. Before passing them to GPT-OSS-20B, you must **convert them into OpenAI-style `"type": "function"` schemas**.