diff --git a/docs/developer_adding_new_models.md b/docs/developer_adding_new_models.md
index b91e8e027d..e0e5a58cb7 100644
--- a/docs/developer_adding_new_models.md
+++ b/docs/developer_adding_new_models.md
@@ -22,7 +22,7 @@ In the second case, if you want to add a model to HELM, you can directly do it i
 * Include any link justifying the metadata used in `ModelMetadata` such as the release data, number of parameters, capabilities and so on (you should not infer anything).
 * Check that you are respecting the format used in those files (`ModelMetadata` should be named as `<CREATOR-ORGANIZATION>/<MODEL-NAME>` and the `ModelDeployment` should be named as `<HOST-ORGANIZATION>/<MODEL-NAME>`, for example `ModelMetadata`: `openai/gpt2` and `ModelDeployment`: `huggingface/gpt2`). Add the appropriate comments and so on.
 * Run `helm-run --run-entries "mmlu:subject=anatomy,model_deployment=<YOUR-DEPLOYMENT>" --suite v1 --max-eval-instances 10` and make sure that everything works. Include the logs from the terminal in your PR.
-* Not create unnecessary objects (`Client` `TokenizerCOnfig`, `WindowService`) and if you have to create one of these objects, document in your PR why you had to. Make them general enough so that they could be re-used by other models (especially the `Client`).
+* Not create unnecessary objects (`Client` `TokenizerConfig`, `WindowService`) and if you have to create one of these objects, document in your PR why you had to. Make them general enough so that they could be re-used by other models (especially the `Client`).
 
 
 ## Example
diff --git a/docs/importing_custom_modules.md b/docs/importing_custom_modules.md
index c790269b44..15823793b7 100644
--- a/docs/importing_custom_modules.md
+++ b/docs/importing_custom_modules.md
@@ -1,36 +1,244 @@
 # Importing Custom Modules
 
-HELM is a modular framework with a plug-in architecture. You can write your own implementation for a client, tokenizer, scenario or metric and use them in HELM with HELM installed as a library, without needing to modify HELM itself.
+HELM is a modular framework with a plug-in architecture. You can write your own implementation for run specs, clients, tokenizers, scenarios, metrics, annotators, perturbations, and window services and use them in HELM with HELM installed as a library, without needing to modify HELM itself.
 
-The main way for you to use your code in HELM is to write a custom Python class that is a subclass of `Client`, `Tokenizer`, `Scenario` or `Metric` in a Python module. You can then specify a `ClientSpec`, `TokenizerSpec`, `ScenarioSpec` or `MetricSpec` (which are all classes of `ObjectSpec`) where the `class_name` is the name of your custom Python class.
+This guide explains:
 
-## Plugin-style registration
+- how HELM discovers custom code
+- your options for loading plugins
+- a complete end-to-end example using Python entry points (recommended)
+- when you may need to set `PYTHONPATH`
 
-Extensions must register themselves at import time, and HELM supports four ways to accomplish this:
+---
 
-1. **Python entry points (recommended).** If your custom code is organized as an installable Python package, you can declare a `helm` entry-point group in your `pyproject.toml`:
+## How HELM finds your code
 
-   ```toml
-   [project.entry-points.helm]
-   my_plugin = "my_package.helm_plugin"
-   ```
+Custom extensions generally work in one of two ways:
 
-   This will allow `helm-run` to automatically import your plugin to make it available at runtime. Installing your package as a wheel (or in developer mode via `pip install -e .`), ensures helm can always discover your plugin without explicit modification of `PYTHONPATH`.
+1. **Run specs (registered by decorator).**  
+   Run specs are registered *at import time* via `@run_spec_function(...)` and are discoverable by name when you invoke `helm-run`.
 
-2. **Namespace packages under the `helm` module.** HELM automatically discovers run specs placed in the `helm.benchmark.run_specs` namespace (via [`pkgutil.iter_modules`](https://docs.python.org/3/library/pkgutil.html#pkgutil.iter_modules)). You can ship a separate package that contributes modules to this namespace (for example, `helm/benchmark/run_specs/my_run_spec.py`) and registers additional run spec functions when imported. In this case your module must be available in the `PYTHONPATH` as described below. 
+2. **ObjectSpec-backed classes (loaded by class name).**  
+   Scenarios, metrics, clients, tokenizers, annotators, perturbations, and window services are defined as classes. HELM loads them by:
+   - importing the module portion of your fully qualified name, and then
+   - looking up the class name you specify in the relevant `ObjectSpec` (e.g., `ScenarioSpec`, `MetricSpec`, `ClientSpec`, `TokenizerSpec`, `AnnotatorSpec`, `PerturbationSpec`, `WindowServiceSpec`).
 
-3. **Explicit imports via `--plugins`.** This option explicitly tells `helm-run` which module contains your plugin code. You can pass either importable module names or filesystem paths to Python files:
+**Key idea:** your module must be importable by Python, and (for run specs) it must be imported so registration code runs.
 
-   ```bash
-   helm-run --plugins my_package.helm_plugin /path/to/local_plugin.py ...
-   ```
+---
 
-   HELM resolves module names with `importlib.import_module` and file paths with `ubelt.import_module_from_path`, so you can load quick experiments without packaging them. Paths are interpreted literally; module names still need to be importable (for example, by adjusting `PYTHONPATH` as described below).
+## Ways to load plugins
 
-4. **Write a Python wrapper script**. There is no need to use the `helm-run` entry point, you can instead write a Python wrapper script that calls `helm.benchmark.run.run_benchmark()`. Python will automatically add the directory containing that script to the Python module search path. If your custom classes live in a Python module under that directory, they will automatically be importable by Python. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details.
+HELM supports four common approaches. Pick the one that matches how "production" vs. "experimental" your plugin is.
+
+### 1) Python entry points (recommended for reusable plugins)
+
+If your custom code is an installable Python package, declare a `helm` entry-point group in your `pyproject.toml`:
+
+```toml
+[project.entry-points.helm]
+my_plugin = "my_package.helm_plugin"
+```
+
+When your package is installed (e.g., as a wheel or with `pip install -e .`), `helm-run` can automatically import the entry point module, making your run specs and classes available without manually tweaking `PYTHONPATH`.
+
+### 2) Explicit imports via `--plugins` (best for quick experiments)
+
+You can explicitly tell `helm-run` what to import, using either an importable module name or a filesystem path to a `.py` file:
+
+```bash
+helm-run --plugins my_package.helm_plugin /path/to/local_plugin.py ...
+```
+
+- **Module names** must already be importable (e.g., installed or on `PYTHONPATH`).
+- **File paths** are loaded directly, which is convenient for one-off local experiments.
+
+### 3) Namespace packages under `helm.benchmark.run_specs` (legacy name-based method)
+
+HELM automatically discovers run specs placed in the `helm.benchmark.run_specs` namespace (via [`pkgutil.iter_modules`](https://docs.python.org/3/library/pkgutil.html#pkgutil.iter_modules)). You can ship a separate package that contributes modules to this namespace (for example, `helm/benchmark/run_specs/my_run_spec.py`) and registers additional run spec functions when imported. In this case your module must be available in the `PYTHONPATH` as described below.
+
+### 4) A Python wrapper script (when you don't want to use `helm-run`)
+
+There is no need to use the `helm-run` entry point, you can instead write a Python wrapper script that calls `helm.benchmark.run.run_benchmark()`. Python will automatically add the directory containing that script to the Python module search path. If your custom classes live in a Python module under that directory, they will automatically be importable by Python. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details.
 
 For example, suppose you implemented a custom `Client` subclass named `MyClient` in the `my_client.py` file under your current working directory, and you have a `ClientSpec` specifying the `class_name` as `my_client.MyClient`. Suppose you added a script called `run_helm.py` that calls `helm.benchmark.run.run_benchmark()` directly. When run using `python run_helm.py`, HELM will be able to import your modules without any additional changes.
 
+When you run `python your_script.py`, Python automatically adds the script's directory to the module search path, so modules under that directory are importable without extra `PYTHONPATH` changes.
+
+---
+
+## Example plugin (entry points + run spec + ObjectSpec classes)
+
+This compact example shows both registration styles in a single module:
+
+- a **run spec** registered via `@run_spec_function(...)`
+- a **scenario** and **metric** referenced via `class_name=...` in `ScenarioSpec`/`MetricSpec`
+
+We'll use the entry point approach because it's the most robust for repeated runs.
+
+### Prerequisites
+
+- A compatible Python (this example uses 3.11)
+- [`uv`](https://docs.astral.sh/uv/) installed
+
+### Step 1 - Initialize a packaged project
+
+From the directory where you want the plugin project:
+
+```bash
+uv init --package my_example_helm_module --python=3.11
+cd my_example_helm_module
+```
+
+### Step 2 - Create your plugin module
+
+Create a module for your plugin code:
+
+```bash
+mkdir -p src/my_example_helm_module
+touch src/my_example_helm_module/my_submodule_plugin_code.py
+```
+
+Your directory should look like:
+
+```text
+my_example_helm_module
+├── pyproject.toml
+├── README.md
+└── src
+    └── my_example_helm_module
+        ├── __init__.py
+        └── my_submodule_plugin_code.py
+```
+
+Paste the following into `src/my_example_helm_module/my_submodule_plugin_code.py`:
+
+```python
+from typing import List, Optional
+
+from helm.benchmark.run_spec import RunSpec, run_spec_function
+from helm.benchmark.adaptation.adapter_spec import AdapterSpec
+from helm.benchmark.metrics.metric import Metric, MetricSpec
+from helm.benchmark.metrics.statistic import Stat
+from helm.benchmark.metrics.metric_service import MetricService
+from helm.benchmark.adaptation.request_state import RequestState
+from helm.benchmark.scenarios.scenario import Scenario, ScenarioSpec, ScenarioMetadata, Instance
+from helm.benchmark.metrics.evaluate_reference_metrics import compute_reference_metrics
+from helm.benchmark.scenarios.scenario import TRAIN_SPLIT, TEST_SPLIT, CORRECT_TAG
+from helm.benchmark.scenarios.scenario import Input, Output, Reference
+
+
+class CustomScenario(Scenario):
+    name = "custom_scenario"
+    description = "A tiny scenario used for testing."
+    tags = ["custom"]
+
+    def get_instances(self, output_path: str) -> List[Instance]:
+        # We include 5 TRAIN_SPLIT instances because the generation adapter
+        # uses a few-shot train instances pool by default. If you return 0
+        # train instances, you'll see: "only 0 training instances, wanted 5".
+        examples = [
+            # (question, answer, split)
+            ("1+1=?", "2", TRAIN_SPLIT),
+            ("2+2=?", "4", TRAIN_SPLIT),
+            ("3+3=?", "6", TRAIN_SPLIT),
+            ("4+4=?", "8", TRAIN_SPLIT),
+            ("5+5=?", "10", TRAIN_SPLIT),
+            ("6+6=?", "12", TEST_SPLIT),
+            ("7+7=?", "14", TEST_SPLIT),
+        ]
+
+        instances: List[Instance] = []
+        train_i = 0
+        test_i = 0
+
+        for q, a, split in examples:
+            if split == TRAIN_SPLIT:
+                train_i += 1
+                instance_id = f"train-{train_i:03d}"
+            else:
+                test_i += 1
+                instance_id = f"test-{test_i:03d}"
+
+            instances.append(
+                Instance(
+                    id=instance_id,
+                    input=Input(text=f"Q: {q}\nA:"),
+                    references=[Reference(output=Output(text=a), tags=[CORRECT_TAG])],
+                    split=split,
+                )
+            )
+
+        return instances
+
+    def get_metadata(self) -> ScenarioMetadata:
+        return ScenarioMetadata(name=self.name, main_metric="custom_metric", main_split="test")
+
+
+class CustomMetric(Metric):
+    """A simple, extensible metric.
+
+    To keep the example compact, we just call HELM's reference-metric helper.
+    """
+
+    def __init__(self, names: Optional[List[str]] = None):
+        self.names = names or ["exact_match"]
+
+    def evaluate_generation(
+        self,
+        adapter_spec: AdapterSpec,
+        request_state: RequestState,
+        metric_service: MetricService,
+        eval_cache_path: str,
+    ) -> List[Stat]:
+        return compute_reference_metrics(
+            names=self.names,
+            adapter_spec=adapter_spec,
+            request_state=request_state,
+            metric_service=metric_service,
+        )
+
+
+@run_spec_function("my_custom_run_spec")
+def build_custom_run_spec() -> RunSpec:
+    return RunSpec(
+        name="my_custom_run_spec",
+        scenario_spec=ScenarioSpec(class_name="my_example_helm_module.my_submodule_plugin_code.CustomScenario"),
+        adapter_spec=AdapterSpec(method="generation"),
+        metric_specs=[MetricSpec(class_name="my_example_helm_module.my_submodule_plugin_code.CustomMetric")],
+    )
+```
+
+Two things to notice:
+
+- The run spec is registered by the decorator **when the module is imported**.
+- The scenario and metric are referenced via fully qualified `class_name` strings.
+
+### Step 3 - Register the plugin via entry points
+
+Edit `pyproject.toml` and add:
+
+```toml
+[project.entry-points.helm]
+my_helm_plugin = "my_example_helm_module.my_submodule_plugin_code"
+```
+
+Then install your package in editable mode:
+
+```bash
+uv pip install -e .
+```
+
+### Step 4 - Run with your custom plugin
+
+Now `helm-run` should discover your plugin through the entry point:
+
+```bash
+helm-run --run-entries my_custom_run_spec:model=openai/gpt2 --suite tutorial --max-eval-instances 10
+```
+
+---
+
 
 ## Adding the current working directory to PYTHONPATH
 
@@ -38,10 +246,10 @@ HELM will only be able to use custom classes that can be imported by Python. Dep
 
 If the custom classes live in a Python module under the current working directory, you should modify `PYTHONPATH` to make that Python module importable.
 
-This is required because - in some environment - Python does not add the current working directory to the Python module search path running when using command line comments / Python entry points such as `helm-run`. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details.
+This is required because - in some environments - Python does not add the current working directory to the Python module search path when running command line commands / Python entry points such as `helm-run`. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details.
 
 For example, suppose you implemented a custom `Client` subclass named `MyClient` in the `my_client.py` file under your current working directory, and you have a `ClientSpec` specifying the `class_name` as `my_client.MyClient`.
 
 To make your file importable by Python, you have to add `.` to your `PYTHONPATH` so that Python will search in your current working directory for your custom Python modules.
 
-In Bash, you can do this by running `export PYTHONPATH=".:$PYTHONPATH"` before running `helm-run`, or by prefixing `helm-run` with `PYTHONPATH=".:$PYTHONPATH `.
+In Bash, you can do this by running `export PYTHONPATH=".:$PYTHONPATH"` before running `helm-run`, or by prefixing `helm-run` with `PYTHONPATH=".:$PYTHONPATH"`.
diff --git a/src/helm/benchmark/test_plugins.py b/src/helm/benchmark/test_plugins.py
index 07faab8b3e..65c8885e92 100644
--- a/src/helm/benchmark/test_plugins.py
+++ b/src/helm/benchmark/test_plugins.py
@@ -2,7 +2,7 @@
 import importlib.metadata
 import logging
 import sys
-
+from textwrap import dedent
 from helm.benchmark.run import import_user_plugins, load_entry_point_plugins
 
 
@@ -34,16 +34,18 @@ def test_load_entry_point_plugins_handles_failures(tmp_path, monkeypatch, caplog
 
     dist_info = plugin_dir / "entrypoint-0.0.0.dist-info"
     dist_info.mkdir()
-    (dist_info / "METADATA").write_text("""\
-Metadata-Version: 2.1
-Name: entrypoint
-Version: 0.0.0
-""")
-    (dist_info / "entry_points.txt").write_text("""\
-[helm_test]
-good = good_plugin:FLAG
-bad = bad_plugin:FLAG
-""")
+    (dist_info / "METADATA").write_text(dedent(
+        """
+        Metadata-Version: 2.1
+        Name: entrypoint
+        Version: 0.0.0
+        """))
+    (dist_info / "entry_points.txt").write_text(dedent(
+        """
+        [helm_test]
+        good = good_plugin:FLAG
+        bad = bad_plugin:FLAG
+        """))
 
     monkeypatch.syspath_prepend(str(plugin_dir))
     importlib.invalidate_caches()
@@ -64,24 +66,23 @@ def test_import_user_plugins_supports_namespace_packages(tmp_path, monkeypatch):
     (plugin_root / "helm" / "benchmark" / "__init__.py").write_text("")
     (run_specs_dir / "__init__.py").write_text("")
 
-    (run_specs_dir / "custom.py").write_text(
+    (run_specs_dir / "custom.py").write_text(dedent(
         """
-from helm.benchmark.adaptation.adapter_spec import AdapterSpec
-from helm.benchmark.metrics.metric import MetricSpec
-from helm.benchmark.run_spec import RunSpec, run_spec_function
-from helm.benchmark.scenarios.scenario import ScenarioSpec
-
-
-@run_spec_function("custom_namespace_run")
-def build_run_spec():
-    return RunSpec(
-        name="custom_namespace_run",
-        scenario_spec=ScenarioSpec(class_name="helm.benchmark.scenarios.scenario.Scenario"),
-        adapter_spec=AdapterSpec(model="dummy"),
-        metric_specs=[MetricSpec(class_name="helm.benchmark.metrics.metric.Metric")],
-    )
-"""
-    )
+        from helm.benchmark.adaptation.adapter_spec import AdapterSpec
+        from helm.benchmark.metrics.metric import MetricSpec
+        from helm.benchmark.run_spec import RunSpec, run_spec_function
+        from helm.benchmark.scenarios.scenario import ScenarioSpec
+
+
+        @run_spec_function("custom_namespace_run")
+        def build_run_spec():
+            return RunSpec(
+                name="custom_namespace_run",
+                scenario_spec=ScenarioSpec(class_name="helm.benchmark.scenarios.scenario.Scenario"),
+                adapter_spec=AdapterSpec(model="dummy"),
+                metric_specs=[MetricSpec(class_name="helm.benchmark.metrics.metric.Metric")],
+            )
+        """))
 
     import helm
     import helm.benchmark
@@ -103,3 +104,108 @@ def build_run_spec():
 
     assert get_run_spec_function("custom_namespace_run") is not None
 
+
+def test_import_user_plugins_supports_object_spec_plugins(tmp_path, monkeypatch):
+    module_name = "custom_component_plugin"
+    module_file = tmp_path / f"{module_name}.py"
+    module_file.write_text(dedent(
+        """
+        from typing import List
+
+        from helm.benchmark.adaptation.adapter_spec import AdapterSpec
+        from helm.benchmark.adaptation.request_state import RequestState
+        from helm.benchmark.metrics.metric import Metric, MetricSpec
+        from helm.benchmark.metrics.metric_service import MetricService
+        from helm.benchmark.metrics.statistic import Stat
+        from helm.benchmark.run_spec import RunSpec, run_spec_function
+        from helm.benchmark.scenarios.scenario import Scenario, ScenarioMetadata, ScenarioSpec, Instance
+        from helm.clients.client import Client
+        from helm.common.request import Request, RequestResult
+        from helm.common.tokenization_request import (
+            TokenizationRequest,
+            TokenizationRequestResult,
+            DecodeRequest,
+            DecodeRequestResult,
+            TokenizationToken,
+        )
+        from helm.tokenizers.tokenizer import Tokenizer
+
+
+        @run_spec_function("custom_plugin_run_spec")
+        def build_run_spec() -> RunSpec:
+            return RunSpec(
+                name="custom_plugin_run_spec",
+                scenario_spec=ScenarioSpec(class_name="custom_component_plugin.CustomScenario"),
+                adapter_spec=AdapterSpec(model="dummy"),
+                metric_specs=[MetricSpec(class_name="custom_component_plugin.CustomMetric")],
+            )
+
+
+        class CustomScenario(Scenario):
+            name = "custom_plugin_scenario"
+            description = "A custom scenario for plugin tests."
+            tags = ["custom"]
+
+            def get_instances(self, output_path: str) -> List[Instance]:
+                return []
+
+            def get_metadata(self) -> ScenarioMetadata:
+                return ScenarioMetadata(name=self.name, main_metric="custom_metric", main_split="test")
+
+
+        class CustomMetric(Metric):
+            def evaluate_generation(
+                self,
+                adapter_spec: AdapterSpec,
+                request_state: RequestState,
+                metric_service: MetricService,
+                eval_cache_path: str,
+            ) -> List[Stat]:
+                return []
+
+
+        class CustomClient(Client):
+            def make_request(self, request: Request) -> RequestResult:
+                return RequestResult(success=True, cached=False, embedding=[], completions=[])
+
+
+        class CustomTokenizer(Tokenizer):
+            def tokenize(self, request: TokenizationRequest) -> TokenizationRequestResult:
+                return TokenizationRequestResult(
+                    success=True,
+                    cached=False,
+                    text=request.text,
+                    tokens=[TokenizationToken(value=request.text)],
+                )
+
+            def decode(self, request: DecodeRequest) -> DecodeRequestResult:
+                return DecodeRequestResult(success=True, cached=False, text="".join(map(str, request.tokens)))
+        """))
+
+    monkeypatch.syspath_prepend(tmp_path)
+
+    if module_name in sys.modules:
+        importlib.invalidate_caches()
+        del sys.modules[module_name]
+
+    import_user_plugins([module_name])
+
+    from helm.benchmark.metrics.metric import Metric, MetricSpec, create_metric
+    from helm.benchmark.model_deployment_registry import ClientSpec
+    from helm.benchmark.scenarios.scenario import Scenario, ScenarioSpec, create_scenario
+    from helm.benchmark.run_spec import get_run_spec_function
+    from helm.benchmark.tokenizer_config_registry import TokenizerSpec
+    from helm.clients.client import Client
+    from helm.common.object_spec import create_object
+    from helm.tokenizers.tokenizer import Tokenizer
+
+    scenario = create_scenario(ScenarioSpec(class_name=f"{module_name}.CustomScenario"))
+    metric = create_metric(MetricSpec(class_name=f"{module_name}.CustomMetric"))
+    client = create_object(ClientSpec(class_name=f"{module_name}.CustomClient"))
+    tokenizer = create_object(TokenizerSpec(class_name=f"{module_name}.CustomTokenizer"))
+
+    assert isinstance(scenario, Scenario)
+    assert isinstance(metric, Metric)
+    assert isinstance(client, Client)
+    assert isinstance(tokenizer, Tokenizer)
+    assert get_run_spec_function("custom_plugin_run_spec") is not None