-
Notifications
You must be signed in to change notification settings - Fork 349
Explicit plugins #3916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Erotemic
wants to merge
14
commits into
stanford-crfm:main
Choose a base branch
from
Erotemic:explicit_plugins
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+550
−14
Open
Explicit plugins #3916
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
e4f4126
Add argument to specify explicit plugins
Erotemic f635cd0
Add argument to specify explicit plugins
Erotemic 51a1588
simplify utils
Erotemic 34f77a1
Improve plugin entry point tests
Erotemic 25c3dae
Refine docs
Erotemic d5c19ed
Document plugin types and extend plugin tests
Erotemic de2c23f
fix typo
Erotemic 02ad568
indent the test code
Erotemic d6b47d0
Rework importing custom module docs
Erotemic 1f4fdf9
doc tweaks
Erotemic 2577388
fix 3rd party newline style, no import alias
Erotemic 1d54c36
Sentence split
Erotemic daf3841
Update docs based on review
Erotemic 9538b93
Update file path wording
Erotemic File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,29 +1,287 @@ | ||
| # Importing Custom Modules | ||
|
|
||
| HELM is a modular framework with a plug-in architecture. You can write your own implementation for a client, tokenizer, scenario or metric and use them in HELM with HELM installed as a library, without needing to modify HELM itself. | ||
| HELM is a modular framework with a plug-in architecture. You can write your own implementation for run specs, clients, tokenizers, scenarios, or metrics and use them in HELM with HELM installed as a library, without needing to modify HELM itself. | ||
|
|
||
| The main way for you to use your code in HELM is to write a custom Python class that is a subclass of `Client`, `Tokenizer`, `Scenario` or `Metric` in a Python module. You can then specify a `ClientSpec`, `TokenizerSpec`, `ScenarioSpec` or `MetricSpec` (which are all classes of `ObjectSpec`) where the `class_name` is the name of your custom Python class. | ||
| In this document, a **plugin** means **user-provided Python code that extends HELM**. Practically, a plugin is a Python *module* that either: | ||
|
|
||
| However, HELM will only be able to use custom classes that can be imported by Python. Depending on your setup, you may need to do additional steps. | ||
| - defines classes that HELM can load by fully-qualified name (e.g., `my_pkg.my_metric.CustomMetric`), and/or | ||
| - registers run specs when the module is imported (via a decorator). | ||
|
|
||
| ## Add the current working directory to PYTHONPATH | ||
| This guide explains: | ||
|
|
||
| If the custom classes live in a Python module under the current working directory, you should modify `PYTHONPATH` to make that Python module importable. | ||
| - how Python importability affects HELM | ||
| - how HELM discovers custom code | ||
| - your options for loading plugin modules | ||
| - a complete end-to-end example using Python entry points (recommended) | ||
|
|
||
| This is required because Python does not add the current working directory to the Python module search path running when using command line comments / Python entry points such as `helm-run`. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details. | ||
| --- | ||
|
|
||
| ## Making your code importable (Python basics) | ||
|
|
||
| HELM will only be able to use custom code that can be imported by Python. In this guide, there are two main ways to make your code importable: | ||
|
|
||
| 1. **Install it as a Python package** (optionally in editable mode). | ||
| 2. **Add a directory to `PYTHONPATH`** so Python searches it for modules. | ||
|
|
||
| ### Add the current working directory to PYTHONPATH | ||
|
|
||
| If the custom code lives in a Python module under the current working directory, you may need to modify `PYTHONPATH` to make that module importable. | ||
|
|
||
| This is required because Python does not add the current working directory to the Python module search path when using command line commands / Python entry points such as `helm-run`. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details. | ||
|
|
||
| For example, suppose you implemented a custom `Client` subclass named `MyClient` in the `my_client.py` file under your current working directory, and you have a `ClientSpec` specifying the `class_name` as `my_client.MyClient`. | ||
|
|
||
| To make your file importable by Python, you have to add `.` to your `PYTHONPATH` so that Python will search in your current working directory for your custom Python modules. | ||
|
|
||
| In Bash, you can do this by running `export PYTHONPATH=".:$PYTHONPATH"` before running `helm-run`, or by prefixing `helm-run` with `PYTHONPATH=".:$PYTHONPATH `. | ||
| In Bash, you can do this by running `export PYTHONPATH=".:$PYTHONPATH"` before running `helm-run`, or by prefixing `helm-run` with `PYTHONPATH=".:$PYTHONPATH"`. | ||
|
|
||
| ### Put your custom code in a Python package | ||
|
|
||
| If your custom code is located in a Python package, you can simply install your package (optionally in editable mode) and it will automatically be importable by Python. Be sure to install your Python package in the same Python environment as HELM. | ||
|
|
||
| ### Write a Python wrapper script | ||
|
|
||
| If you are using a Python wrapper script that calls `helm.benchmark.run.run_benchmark()` instead of using `helm-run`, Python will automatically add the directory containing that script to the Python module search path. If your custom code lives in a Python module under that directory, it will automatically be importable by Python. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details. | ||
|
|
||
| --- | ||
|
|
||
| ## How HELM finds your code | ||
|
|
||
| Custom extensions generally work in one of two ways: | ||
|
|
||
| 1. **ObjectSpec-backed classes (loaded by class name).** | ||
| Clients, tokenizers, scenarios, and metrics are defined as classes. HELM loads them by: | ||
| - importing the module portion of your fully qualified name, and then | ||
| - looking up the class name you specify in the relevant `ObjectSpec` (e.g., `ScenarioSpec`, `MetricSpec`, `ClientSpec`, `TokenizerSpec`). | ||
|
|
||
| **Key idea:** these modules only need to be *importable* by Python. They do not need to be imported ahead of time. | ||
|
|
||
| 2. **Run specs (registered by decorator).** | ||
| Run specs are registered *at import time* via `@helm.benchmark.run_spec.run_spec_function(...)` and are discoverable by name when you invoke `helm-run`. | ||
|
|
||
| **Key idea:** the module containing the run spec function must be imported so registration code runs. Only modules that define run spec functions need to be imported for discovery. | ||
|
|
||
| --- | ||
|
|
||
| ## Ways to load plugin modules | ||
|
|
||
| The approaches below are mostly about ensuring that modules containing run spec functions get imported (so run specs register). They may also be useful for importing other code early (for example, to fail fast on syntax errors), but only run specs *require* this. | ||
|
|
||
| ### 1) Python entry points (recommended for reusable plugins) | ||
|
|
||
| If your custom code is an installable Python package, declare a `helm` entry-point group in your `pyproject.toml`: | ||
|
|
||
| ```toml | ||
| [project.entry-points.helm] | ||
| my_plugin = "my_package.helm_plugin" | ||
| ``` | ||
|
|
||
| When your package is installed (e.g., as a wheel or with `pip install -e .`), `helm-run` can automatically import the entry point module. | ||
|
|
||
| With this method, `project.entry-points.helm` only needs to include modules that contain run spec functions (and any other modules you explicitly want imported up front). | ||
|
|
||
| ### 2) Explicit imports via `--plugins` (best for quick experiments) | ||
|
|
||
| You can explicitly tell `helm-run` what to import. Each `--plugins` argument can be either an importable module name or a filesystem path to a `.py` file. | ||
|
|
||
| Importable module names (modules must already be importable, e.g., installed or on `PYTHONPATH`): | ||
|
|
||
| ```bash | ||
| helm-run --plugins my_package.helm_plugin_a my_package.helm_plugin_b ... | ||
| ``` | ||
|
|
||
| Filesystem paths (loaded from the given `.py` files): | ||
|
|
||
| ```bash | ||
| helm-run --plugins /path/to/local_plugin_a.py /path/to/local_plugin_b.py ... | ||
| ``` | ||
|
|
||
| **How file paths work:** HELM loads plugin paths via [`ubelt.import_module_from_path`](https://ubelt.readthedocs.io/en/latest/auto/ubelt.util_import.html#ubelt.util_import.import_module_from_path). Single-file plugins (a standalone `.py`) are the simplest and most reliable. If your plugin lives in a package, it's usually best to pass the **module name** and make sure it is installed or on `PYTHONPATH`. Path-based loading can also work for packaged plugins, as long as the package is an explicit package (i.e., package directories include `__init__.py`, including parent packages). | ||
|
|
||
|
|
||
| ### 3) Namespace packages under `helm.benchmark.run_specs` (legacy name-based method) | ||
|
|
||
| HELM automatically discovers run specs placed in the `helm.benchmark.run_specs` namespace (via [`pkgutil.iter_modules`](https://docs.python.org/3/library/pkgutil.html#pkgutil.iter_modules)). | ||
|
|
||
| You can ship a separate package that contributes modules to this namespace (for example, `helm/benchmark/run_specs/my_run_spec.py`) and registers additional run spec functions when imported. | ||
|
|
||
| This method requires that your package is importable (typically by installing it, or by ensuring it is on `PYTHONPATH`). | ||
|
|
||
| ### 4) A Python wrapper script (when you don't want to use `helm-run`) | ||
|
|
||
| There is no need to use the `helm-run` entry point. You can instead write a Python wrapper script that calls `helm.benchmark.run.run_benchmark()`. | ||
|
|
||
| When you run `python your_script.py`, Python automatically adds the script's directory to the module search path. This implicitly changes import behavior in the same way as adding that directory to `PYTHONPATH`. | ||
|
|
||
| --- | ||
|
|
||
| ## Example plugin (entry points + run spec + ObjectSpec classes) | ||
|
|
||
| This compact example shows both mechanisms: | ||
|
|
||
| - a run spec registered via `@helm.benchmark.run_spec.run_spec_function(...)` | ||
| - a scenario and metric referenced via `class_name=...` in `ScenarioSpec`/`MetricSpec` | ||
|
|
||
| We use the entry point approach because it's the most robust for repeated runs. | ||
|
|
||
| Note: For a small tutorial, we put the run spec and the classes in one file. In larger projects you may prefer to keep your run specs in a dedicated module (the only part that must be imported up front), and keep scenarios/metrics in separate modules. | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| - A compatible Python (this example uses 3.11) | ||
| - [`uv`](https://docs.astral.sh/uv/) installed | ||
|
|
||
| ### Step 1 - Initialize a packaged project | ||
|
|
||
| From the directory where you want the plugin project: | ||
|
|
||
| ```bash | ||
| uv init --package my_example_helm_module --python=3.11 | ||
| cd my_example_helm_module | ||
| ``` | ||
|
|
||
| ### Step 2 - Create your plugin module | ||
|
|
||
| Create a module for your plugin code: | ||
|
|
||
| ```bash | ||
| mkdir -p src/my_example_helm_module | ||
| touch src/my_example_helm_module/my_submodule_plugin_code.py | ||
| ``` | ||
|
|
||
| Your directory should look like: | ||
|
|
||
| ```text | ||
| my_example_helm_module | ||
| ├── pyproject.toml | ||
| ├── README.md | ||
| └── src | ||
| └── my_example_helm_module | ||
| ├── __init__.py | ||
| └── my_submodule_plugin_code.py | ||
| ``` | ||
|
|
||
| Write the following into `src/my_example_helm_module/my_submodule_plugin_code.py`: | ||
|
|
||
| ```python | ||
| from typing import List, Optional | ||
|
|
||
| from helm.benchmark.run_spec import RunSpec, run_spec_function | ||
| from helm.benchmark.adaptation.adapter_spec import AdapterSpec | ||
| from helm.benchmark.metrics.metric import Metric, MetricSpec | ||
| from helm.benchmark.metrics.statistic import Stat | ||
| from helm.benchmark.metrics.metric_service import MetricService | ||
| from helm.benchmark.adaptation.request_state import RequestState | ||
| from helm.benchmark.scenarios.scenario import Scenario, ScenarioSpec, ScenarioMetadata, Instance | ||
| from helm.benchmark.metrics.evaluate_reference_metrics import compute_reference_metrics | ||
| from helm.benchmark.scenarios.scenario import TRAIN_SPLIT, TEST_SPLIT, CORRECT_TAG | ||
| from helm.benchmark.scenarios.scenario import Input, Output, Reference | ||
|
|
||
|
|
||
| class CustomScenario(Scenario): | ||
| name = "custom_scenario" | ||
| description = "A tiny scenario used for testing." | ||
| tags = ["custom"] | ||
|
|
||
| def get_instances(self, output_path: str) -> List[Instance]: | ||
| # We include 5 TRAIN_SPLIT instances because the generation adapter | ||
| # uses a few-shot train instances pool by default. If you return 0 | ||
| # train instances, you'll see: "only 0 training instances, wanted 5". | ||
| examples = [ | ||
| # (question, answer, split) | ||
| ("1+1=?", "2", TRAIN_SPLIT), | ||
| ("2+2=?", "4", TRAIN_SPLIT), | ||
| ("3+3=?", "6", TRAIN_SPLIT), | ||
| ("4+4=?", "8", TRAIN_SPLIT), | ||
| ("5+5=?", "10", TRAIN_SPLIT), | ||
| ("6+6=?", "12", TEST_SPLIT), | ||
| ("7+7=?", "14", TEST_SPLIT), | ||
| ] | ||
|
|
||
| instances: List[Instance] = [] | ||
| train_i = 0 | ||
| test_i = 0 | ||
|
|
||
| for q, a, split in examples: | ||
| if split == TRAIN_SPLIT: | ||
| train_i += 1 | ||
| instance_id = f"train-{train_i:03d}" | ||
| else: | ||
| test_i += 1 | ||
| instance_id = f"test-{test_i:03d}" | ||
|
|
||
| instances.append( | ||
| Instance( | ||
| id=instance_id, | ||
| input=Input(text=f"Q: {q}\nA:"), | ||
| references=[Reference(output=Output(text=a), tags=[CORRECT_TAG])], | ||
| split=split, | ||
| ) | ||
| ) | ||
|
|
||
| return instances | ||
|
|
||
| def get_metadata(self) -> ScenarioMetadata: | ||
| return ScenarioMetadata(name=self.name, main_metric="custom_metric", main_split="test") | ||
|
|
||
|
|
||
| class CustomMetric(Metric): | ||
| """A simple, extensible metric. | ||
| To keep the example compact, we just call HELM's reference-metric helper. | ||
| """ | ||
|
|
||
| def __init__(self, names: Optional[List[str]] = None): | ||
| self.names = names or ["exact_match"] | ||
|
|
||
| def evaluate_generation( | ||
| self, | ||
| adapter_spec: AdapterSpec, | ||
| request_state: RequestState, | ||
| metric_service: MetricService, | ||
| eval_cache_path: str, | ||
| ) -> List[Stat]: | ||
| return compute_reference_metrics( | ||
| names=self.names, | ||
| adapter_spec=adapter_spec, | ||
| request_state=request_state, | ||
| metric_service=metric_service, | ||
| ) | ||
|
|
||
|
|
||
| @run_spec_function("my_custom_run_spec") | ||
| def build_custom_run_spec() -> RunSpec: | ||
| return RunSpec( | ||
| name="my_custom_run_spec", | ||
| scenario_spec=ScenarioSpec(class_name="my_example_helm_module.my_submodule_plugin_code.CustomScenario"), | ||
| adapter_spec=AdapterSpec(method="generation"), | ||
| metric_specs=[MetricSpec(class_name="my_example_helm_module.my_submodule_plugin_code.CustomMetric")], | ||
| ) | ||
| ``` | ||
|
|
||
| Two things to notice: | ||
|
|
||
| - The run spec is registered by the decorator when the module is imported. | ||
| - The scenario and metric are referenced via fully qualified `class_name` strings. | ||
|
|
||
| ### Step 3 - Register the plugin via entry points | ||
|
|
||
| Edit `pyproject.toml` and add: | ||
|
|
||
| ```toml | ||
| [project.entry-points.helm] | ||
| my_helm_plugin = "my_example_helm_module.my_submodule_plugin_code" | ||
| ``` | ||
|
|
||
| ## Put your custom classes in a Python package | ||
| Then install your package in editable mode: | ||
|
|
||
| If your custom classes are located in a Python package, you can simply install your package (optionally in editable mode) and it will automatically be importable by Python. Be sure to install your Python package in the same Python environment as HELM. | ||
| ```bash | ||
| uv pip install -e . | ||
| ``` | ||
|
|
||
| ## Write a Python wrapper script | ||
| ### Step 4 - Run with your custom run spec | ||
|
|
||
| If you are using a Python wrapper script that calls `helm.benchmark.run.run_benchmark()` instead of using `helm-run`, Python will automatically add the directory containing that script to the Python module search path. If your custom classes live in a Python module under that directory, they will automatically be importable by Python. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details. | ||
| Now `helm-run` should discover your plugin through the entry point: | ||
|
|
||
| For example, suppose you implemented a custom `Client` subclass named `MyClient` in the `my_client.py` file under your current working directory, and you have a `ClientSpec` specifying the `class_name` as `my_client.MyClient`. Suppose you added a script called `run_helm.py` that calls `helm.benchmark.run.run_benchmark()` directly. When run using `python run_helm.py`, HELM will be able to import your modules without any additional changes. | ||
| ```bash | ||
| helm-run --run-entries my_custom_run_spec:model=openai/gpt2 --suite tutorial --max-eval-instances 10 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The first sentence is awkward. I would suggest replacing it with something like: "Instead of using the
helm-runentry point, you can instead write your own Python wrapper script that calls..."