-
Notifications
You must be signed in to change notification settings - Fork 347
Integrate Psychometric-Based Question Validity Tools into HELM (Issue #3645) #3669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| help="EXPERIMENTAL: Full class name of the Summarizer class to use. If unset, uses the default Summarizer.", | ||
| ) | ||
| parser.add_argument( | ||
| "--validity-check", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer this to be --psychometric-validity-check because "validity" is a vague concept (it could be data completeness validation, or data schema validation, or other kinds of validation).
| def write_run_display_json(self, skip_completed: bool) -> None: | ||
| def process(run: Run) -> None: | ||
| write_run_display_json(run.run_path, run.run_spec, self.schema, skip_completed) | ||
| write_run_display_json(run.run_path, run.run_spec, self.schema, self.validity_check, skip_completed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.validity_check should be the last argument.
| verbose: bool, | ||
| num_threads: int, | ||
| allow_unknown_models: bool, | ||
| validity_check: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this to psychometrics_validity_check or something that identifies the paper.
Also, set the default value to False to fix these errors:
src/helm/benchmark/presentation/torr_robustness_summarizer.py:36: error: Missing positional argument "validity_check" in call to "__init__" of "Summarizer" [call-arg]
src/helm/benchmark/presentation/test_summarize.py:13: error: Missing positional argument "validity_check" in call to "Summarizer" [call-arg]
src/helm/benchmark/presentation/test_summarize.py:31: error: Missing positional argument "validity_check" in call to "Summarizer" [call-arg]
| @htrack(None) | ||
| def write_run_display_json(run_path: str, run_spec: RunSpec, schema: Schema, skip_completed: bool) -> None: | ||
| def write_run_display_json( | ||
| run_path: str, run_spec: RunSpec, schema: Schema, skip_completed: bool, validity_check: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change validity_check to psychometrics_validity_check or something that identifies the paper.
|
This fixes #3645. |
|
This pull request is still causing the type checker to fail. If you'd like to merge, please resolve the type checking issues and update this pull request. |
|
Hi, it's been a month since the last update; are you still working on this? |
We add a new bool argument --validity-check to helm-summarize. If it is activated, we load the four pre-calculated validity metrics values from HuggingFace and write them into the display_prediction.json. In this way, we achieve the goal of displaying the validity metrics values on the HELM website. The script to calculate those four validity metrics is in
scripts/validity_check.py.