Integrate Psychometric-Based Question Validity Tools into HELM (Issue #3645) #3669

yuhengtu · 2025-06-14T11:33:13Z

We add a new bool argument --validity-check to helm-summarize. If it is activated, we load the four pre-calculated validity metrics values from HuggingFace and write them into the display_prediction.json. In this way, we achieve the goal of displaying the validity metrics values on the HELM website. The script to calculate those four validity metrics is in scripts/validity_check.py.

scripts/validity_check.py

src/helm/benchmark/presentation/summarize.py

src/helm/benchmark/presentation/run_display.py

scripts/validity_check.py

yifanmai · 2025-06-23T20:56:53Z

src/helm/benchmark/presentation/summarize.py

        help="EXPERIMENTAL: Full class name of the Summarizer class to use. If unset, uses the default Summarizer.",
    )
+    parser.add_argument(
+        "--validity-check",


I would prefer this to be --psychometric-validity-check because "validity" is a vague concept (it could be data completeness validation, or data schema validation, or other kinds of validation).

yifanmai · 2025-06-23T20:58:08Z

src/helm/benchmark/presentation/summarize.py

    def write_run_display_json(self, skip_completed: bool) -> None:
        def process(run: Run) -> None:
-            write_run_display_json(run.run_path, run.run_spec, self.schema, skip_completed)
+            write_run_display_json(run.run_path, run.run_spec, self.schema, self.validity_check, skip_completed)


self.validity_check should be the last argument.

yifanmai · 2025-06-23T20:58:23Z

src/helm/benchmark/presentation/summarize.py

        verbose: bool,
        num_threads: int,
        allow_unknown_models: bool,
+        validity_check: bool,


Change this to psychometrics_validity_check or something that identifies the paper.

Also, set the default value to False to fix these errors:

src/helm/benchmark/presentation/torr_robustness_summarizer.py:36: error: Missing positional argument "validity_check" in call to "__init__" of "Summarizer" [call-arg] src/helm/benchmark/presentation/test_summarize.py:13: error: Missing positional argument "validity_check" in call to "Summarizer" [call-arg] src/helm/benchmark/presentation/test_summarize.py:31: error: Missing positional argument "validity_check" in call to "Summarizer" [call-arg]

yifanmai · 2025-06-23T20:58:35Z

src/helm/benchmark/presentation/run_display.py

 @htrack(None)
-def write_run_display_json(run_path: str, run_spec: RunSpec, schema: Schema, skip_completed: bool) -> None:
+def write_run_display_json(
+    run_path: str, run_spec: RunSpec, schema: Schema, skip_completed: bool, validity_check: bool = False


Change validity_check to psychometrics_validity_check or something that identifies the paper.

yifanmai · 2025-06-23T23:23:05Z

This fixes #3645.

yifanmai · 2025-07-15T20:50:27Z

This pull request is still causing the type checker to fail. If you'd like to merge, please resolve the type checking issues and update this pull request.

yifanmai · 2025-08-15T18:03:13Z

Hi, it's been a month since the last update; are you still working on this?

Yuheng Tu added 2 commits June 13, 2025 05:20

tetrachoric

6761481

4 validity metrics

bc8653c

yifanmai requested changes Jun 19, 2025

View reviewed changes

second commit

07b5b11

yifanmai approved these changes Jun 21, 2025

View reviewed changes

scripts/validity_check.py Outdated Show resolved Hide resolved

scripts/validity_check.py Show resolved Hide resolved

fix linter

ee072a8

yifanmai requested changes Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate Psychometric-Based Question Validity Tools into HELM (Issue #3645) #3669

Integrate Psychometric-Based Question Validity Tools into HELM (Issue #3645) #3669

Uh oh!

yuhengtu commented Jun 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yifanmai Jun 23, 2025

Uh oh!

yifanmai Jun 23, 2025

Uh oh!

yifanmai Jun 23, 2025

Uh oh!

yifanmai Jun 23, 2025

Uh oh!

yifanmai commented Jun 23, 2025

Uh oh!

yifanmai commented Jul 15, 2025

Uh oh!

yifanmai commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Integrate Psychometric-Based Question Validity Tools into HELM (Issue #3645) #3669

Are you sure you want to change the base?

Integrate Psychometric-Based Question Validity Tools into HELM (Issue #3645) #3669

Uh oh!

Conversation

yuhengtu commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yifanmai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

yifanmai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

yifanmai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

yifanmai Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

yifanmai commented Jun 23, 2025

Uh oh!

yifanmai commented Jul 15, 2025

Uh oh!

yifanmai commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuhengtu commented Jun 14, 2025 •

edited

Loading