-
Notifications
You must be signed in to change notification settings - Fork 347
chakravarthik27/fix medcalc bench dataset path #3921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chakravarthik27/fix medcalc bench dataset path #3921
Conversation
|
Hi @yifanmai, @MiguelAFH Medcalc_bench v1.0 is returning a 404 error here: https://huggingface.co/datasets/ncbi/MedCalc-Bench-v1.0. Could you please review this PR? Thanks. |
|
The link you sent https://huggingface.co/datasets/ncbi/MedCalc-Bench-v1.0 does not return a 404. Could you clarify what you meant by this? As for upgrading to MedCalc-Bench-v2.0, I am OK with this, but it should be a new separate run spec function in order to maintain reverse compatibility. Users running evals using the existing MedCalc-Bench should not see any changes. |
Hi @yifanmai A few weeks ago, I encountered a 404 error and saw the dataset for version 2.0. Now, there were also additional updates to versions 1.1 and 1.2. As you suggested, I will work on creating the new run specifications for medcalc_bench_v1.1 and medcalc_bench_v1.2. Thanks Regards |
|
Great, thanks for the update. |
|
Hi, I was going to come to raise an issue about suggesting to use the new medcalc dataset, but it seems that someone else has gotten here before me. I have made a few more changes to the MedCalc-Bench dataset from v1.2 and you can find the newest dataset here: https://github.com/nikhilk7153/MedCalc-Bench-Verified. All updates will be made on this new repo. MedCalc-Bench Verified is an updated version from v1.2. You can find the changes from the verified version here in the released version: https://github.com/nikhilk7153/MedCalc-Bench-Verified/releases/tag/MedCalc-Bench-Verified |
|
Hi @yifanmai can you please review this PR? Thanks |
|
|
||
| class MATHScenario(Scenario): | ||
| """ | ||
| r""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert unrelated change.
| ) | ||
|
|
||
|
|
||
| @run_spec_function("medcalc_bench_v1_0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a lot of duplicated code here. I would suggest having the version number as a parameter, and then doing something like the following. This reduces the duplicated code, and also preserves backwards compatibility.
@run_spec_function("medcalc_bench")
def get_medcalc_bench_spec(version: Optional[str] = None) -> RunSpec:
scenario_args = {} if version is None else {"version": version}
scenario_spec = ScenarioSpec(class_name="helm.benchmark.scenarios.medcalc_bench_scenario.MedCalcBenchScenario")
# ...
run_spec_name = "medcalc_bench" if version is None else f"medcalc_bench:version={version}"
return RunSpec(
name=run_spec_name,
scenario_spec=scenario_spec,
adapter_spec=adapter_spec,
metric_specs=metric_specs,
groups=["medcalc_bench"],
)| MedCalc-Bench v1.0 is an updated the version of the MedCalc-Bench dataset designed to | ||
| evaluate LLMs' capabilities in medical calculations. This version serves as a baseline | ||
| for assessing the performance of language models in computing clinically relevant values | ||
| from patient notes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like, you can merge the version-level changelog into the class docstring for MedCalcBenchScenario.
|
Will there be any future plans to update MedHELM @yifanmai ? Would it be possible to use MedCalc-Bench Verified instead of the v1.0 that is currently being used? We have fixed a number of annotation and ground truth label issues (approx. 1/3) of the dataset and so re-running would be beneficial to provide a more accurate version of the landscape. |
|
I think this is more of a question for @MiguelAFH - the official evals and results are maintained by them, so it depend on whether there is funding and bandwidth available for this. |
This pull request updates the MedCalc-Bench scenario to use the latest dataset version and adds a basic test for the scenario. The main changes focus on keeping the dataset reference current and improving test coverage.
MedCalc-Bench scenario updates:
MedCalc-Bench-v2.0instead ofv1.0inmedcalc_bench_scenario.py.MedCalc-Bench-v1.0 changed to MedCalc-Bench-v2.0
Testing improvements:
test_medcalc_bench_scenario.pywith a pytest-based test that verifies the scenario loads instances and that the first instance is from the "test" split.Documentation formatting:
MATHScenarioclass to use a raw string for improved formatting.@yifanmai @MiguelAFH
Could you please review this PR?