Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,15 @@ A simple inference framework for the CURE-Bench bio-medical AI competition. This
2025.08.08: **[Question&Answer page](QA.md)**: We have created a Q&A page to share all our responses to questions from participants, ensuring fair competition.

2025.09.10: Added starterkit code and tutorials for running **GPT-OSS-20B**, OpenAI’s 20B open-weight reasoning model.

Now includes:
* **Config-driven options** — control reasoning depth, system identity, developer instructions, and external/internal tools (for Track 2), directly from your config.
* **ToolUniverse integration (Track 2)** — easily plug in ToolUniverse tools with GPT-OSS-20B. A schema conversion step is still required, but we’ve provided an example converter in `test_gptoss20b.py` to make this straightforward.
* **Structured reasoning trace logging** — Harmony-formatted reasoning steps and final answers are automatically saved in your submissions.
* **Subset evaluation (`--subset-size`)** — run/debug on a smaller slice of *any* dataset before full submissions (works with all models, not just GPT-OSS-20B). Example:
```bash
python run.py --config metadata_config_test.json --subset-size 20
```

## Quick Start

### Installation Dependencies
Expand Down
2 changes: 1 addition & 1 deletion tutorials/gpt_oss_20b/tutorial_gptoss20b.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,6 @@ python run.py --model-name myuser/gpt-oss-20b-curebench-ft
---
## 10. Important Notes

- **Tool interleaving:** Tools are not separate “modes.” GPT-OSS can **mix chain-of-thought, tool calls, and final answers in a single Harmony trace**. This means you may see `analysis → tool call → analysis → final` all in one response.
- **Instruction hierarchy:** Harmony enforces a strict priority order:
**System > Developer > User > Assistant > Tool**.
- *System* messages always take precedence (e.g., competition rules).
Expand All @@ -313,6 +312,7 @@ python run.py --model-name myuser/gpt-oss-20b-curebench-ft
- *Assistant* outputs may include both reasoning (`analysis`) and answers (`final`).
- *Tool* calls are lowest priority and always embedded within the reasoning trace.
- GPT-OSS models *require* the **Harmony format**. If you call `model.generate` directly without Harmony encoding, the model will not behave correctly. The wrapper provided here automatically handles Harmony encoding/decoding for you.
- **Tool interleaving:** Tools are not separate “modes.” GPT-OSS can **mix chain-of-thought, tool calls, and final answers in a single Harmony trace**. This means you may see `analysis → tool call → analysis → final` all in one response.
- **Safety note:** Reasoning traces are not filtered; they may hallucinate.
- **Tool schemas:** ToolUniverse defines its ~215 biomedical APIs in its own JSON format.
Before passing them to GPT-OSS-20B, you must **convert them into OpenAI-style `"type": "function"` schemas**.
Expand Down