-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Pull requests: openai/evals
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add support for new models (gpt-4o, o1-preview and o1-mini)
#1558
opened Sep 15, 2024 by
sakher
Loading…
Bugfixing completion stats break with new reasoning tokens release
#1555
opened Sep 13, 2024 by
lucapericlp
Loading…
Fix a bug in examples/mmlu.ipynb when using gpt-4o or gpt-4o-mini
#1551
opened Aug 25, 2024 by
RobinWitch
Loading…
13 tasks done
Fix the is_chat_model function to work with gpt-4o
#1550
opened Aug 22, 2024 by
LoryPack
Loading…
3 tasks done
Added Icelandic QA evaluation data from news texts
#1548
opened Aug 20, 2024 by
thorunna
Loading…
12 of 13 tasks
Added Icelandic QA evaluation data from Wikipedia
#1547
opened Aug 20, 2024 by
thorunna
Loading…
12 of 13 tasks
Updating make-me-say to be compatible with Solvers
#1546
opened Aug 18, 2024 by
lennart-finke
Loading…
1 task done
Fix Information exposure alert through an exception #1543
#1545
opened Aug 8, 2024 by
arpitjain099
Loading…
13 tasks done
Fix Unit Test Failures in OpenAI, Anthropic, and Google Gemini Resolvers
#1537
opened Jun 24, 2024 by
sakher
Loading…
Update README: Add Langtrace as an Eval vendor
#1531
opened May 21, 2024 by
karthikscale3
Loading…
5 of 13 tasks
Added Quran Eval & Simple Fact Model-Graded Definition
#1511
opened Apr 1, 2024 by
sakher
Loading…
13 tasks done
Add Classification Rule Articulation Eval
#1510
opened Mar 30, 2024 by
danesherbs
Loading…
13 tasks done
Fix specifying API arguments from the CLI
#1505
opened Mar 27, 2024 by
LoryPack
Loading…
6 tasks done
[Evals] Add eval for Dhivehi diacritical marks
#1495
opened Mar 16, 2024 by
aanaseer
Loading…
11 of 12 tasks
Adding Indian Women Menstrual Health Chatbot Eval
#1430
opened Dec 11, 2023 by
cranberrydeveloper
Loading…
13 tasks done
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.