can i ask abt the discrepancy of the public set results for GLM?

on https://labs.scale.com/leaderboard/swe_bench_pro_public

it states that glm-4.6 scores abt 9.67%

<img width="536" height="104" alt="Image" src="https://github.com/user-attachments/assets/58e14219-d9ca-4db3-b347-9e26d1e6088f" />

but when i check the trajectories via https://docent.transluce.org/dashboard/032fb63d-4992-4bfc-911d-3b7dafcb931f

1. no glm-4.6 only `glm-4.5 -10222025`
2. `glm-4.5-10222025` shows 259 resolved instances out of 731 public resolved instances

<img width="857" height="401" alt="Image" src="https://github.com/user-attachments/assets/a8d3a7f4-bd06-454f-b0f0-327a450744eb" />

259/731 is deffinitely a lot higher than 9%. 

What's the issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can i ask abt the discrepancy of the public set results for GLM? #89

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

can i ask abt the discrepancy of the public set results for GLM? #89

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions