Skip to content

Commit 7f6651e

Browse files
authored
Merge pull request #30 from scicode-bench/zilinghan/doc-update
add o3 with different reasoning efforts
2 parents 18f78d8 + 4debb5c commit 7f6651e

File tree

2 files changed

+6
-4
lines changed

2 files changed

+6
-4
lines changed

.gitignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ logs/**
99
**/logs/**
1010
**/tmp/**
1111
integration/**
12-
12+
test.sh
1313
# -------
1414

1515
# Created by https://www.toptal.com/developers/gitignore/api/python

README.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,11 @@ SciCode sources challenging and realistic research-level coding problems across
3333

3434
| Models | Main Problem Resolve Rate | <span style="color:grey">Subproblem</span> |
3535
|--------------------------|-------------------------------------|-------------------------------------|
36-
| 🥇 OpenAI o3-mini | <div align="center">**9.2**</div> | <div align="center" style="color:grey">33.0</div> |
37-
| 🥈 OpenAI o1-preview | <div align="center">**7.7**</div> | <div align="center" style="color:grey">28.5</div> |
38-
| 🥉 Deepseek-R1 | <div align="center">**4.6**</div> | <div align="center" style="color:grey">28.5</div> |
36+
| 🥇 OpenAI o3-mini-low | <div align="center">**10.8**</div> | <div align="center" style="color:grey">33.3</div> |
37+
| 🥈 OpenAI o3-mini-high | <div align="center">**9.2**</div> | <div align="center" style="color:grey">34.4</div> |
38+
| 🥉 OpenAI o3-mini-medium | <div align="center">**9.2**</div> | <div align="center" style="color:grey">33.0</div> |
39+
| OpenAI o1-preview | <div align="center">**7.7**</div> | <div align="center" style="color:grey">28.5</div> |
40+
| Deepseek-R1 | <div align="center">**4.6**</div> | <div align="center" style="color:grey">28.5</div> |
3941
| Claude3.5-Sonnet | <div align="center">**4.6**</div> | <div align="center" style="color:grey">26.0</div> |
4042
| Claude3.5-Sonnet (new) | <div align="center">**4.6**</div> | <div align="center" style="color:grey">25.3</div> |
4143
| Deepseek-v3 | <div align="center">**3.1**</div> | <div align="center" style="color:grey">23.7</div> |

0 commit comments

Comments
 (0)