Skip to content

Feat/medexqa judges#67

Draft
mnishant2 wants to merge 9 commits intoMedARC-AI:mainfrom
mnishant2:feat/medexqa-judges
Draft

Feat/medexqa judges#67
mnishant2 wants to merge 9 commits intoMedARC-AI:mainfrom
mnishant2:feat/medexqa-judges

Conversation

@mnishant2
Copy link
Copy Markdown
Contributor

This new draft PR contains the new llm judges to eval explanation. I have updated the README and added comments wherever needed. TL;DR, Run vf-eval medexqa with use_judges=False -s to save/cache outputs for different specialties. Then run tools/judge_rescore with factscore or g-eval to get LLM judge scores in both settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant