Skip to content

Conversation

@Erotemic
Copy link
Contributor

@Erotemic Erotemic commented Nov 7, 2025

This partially addresses the issue I had in #3915. The main missing piece is documentation updates that shows the user how they can define a new run spec independent of modifying HELM source.

The main idea is add an argument --plugin which can be a list of module names or paths to python modules. The new behavior is that these specified modules are imported at the start of the run_benchmarking function, so custom run specs will be found.

Importing modules based on their importable python name is fairly straight forward, and sometimes useful, however, I would almost always prefer to specify a path to a plugin file so it is explicit what custom logic I'm using. Unfortunately the stdlib doesn't have a mechanism to import a module from its path, but I've been maintaining one in my ubelt library for years, and I simply ported the logic of that into helm/common/import_utils.py. (The way it works is by temporarily adding the module's directory to the PYTHONPATH and then using normal import mechanisms).

If maintainers like the basic idea then I'll go ahead and update the documentation and tests in this PR as well.

@Erotemic
Copy link
Contributor Author

Erotemic commented Jan 8, 2026

I used codex to:

  1. refactor this to remove the vendered code and just depend on ubelt (this significantly simplifies the diff)
  2. add tests that the different ways to register plugins are respected
  3. add documentation for how to accomplish this

Codex task: https://chatgpt.com/codex/tasks/task_b_69334998af5c832db233d6de61a5e75d

@Erotemic
Copy link
Contributor Author

@yifanmai I think this PR is done / ready for review. On a high level this PR:

  • Introduces a standardized entry-point style plugin (similar to what pytest uses)
  • Introduces an explicit --plugin command line argument for easy experimentation. (Adds ubelt dependency for path support, so very minimal new code)
  • Adds unit tests for each of these new features
  • Overhauls docs/importing_custom_modules.md which now contains a step-by-step example.

I also fixed a typo in docs/developer_adding_new_models.md.

Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes; this is really great! I think the new approach is much more user-friendly, and the tutorial is very helpful for explaining how to set things up. I have some high level feedback that I'd like to discuss first:

  • Generally, the new documentation tends to conflate making things importable (e.g. scenarios, metrics) vs importing things (e.g. run spec functions). The documentation seems to assume that the user already knows how importability works when discussing how to import run spec functions. You could consider moving the PYTHONPATH section earlier in the documentation, and give a more complete explanation of how importability works in Python (using PYTHONPATH or installed package, which are the two methods we use here). You could also reuse some text from the old doc.
    • The tutorial puts the scenario and the run spec function in the same file. I think it would be better to clarify that only the run spec function needs to be imported.
    • With method 1, it should be clearer that project.entry-points.helm only needs to include modules with run spec functions.
    • With method 2, you should specify how the importable module path is determined when using a file path with the argument.
    • Methods 3 requires using the PYTHONPATH method, and method 4 implicitly changes PYTHONPATH; this should be made clearer.
  • Relatedly, I think there are a few plausible alternatives for naming. I thought that --plugin was vague, and also the documentation never defines what a plugin is. Alternatives: --import-modules is more explicit in explaining what the argument does. --run-spec-module is more explicit about what the user will use the argument for.

What do you think?


### 4) A Python wrapper script (when you don't want to use `helm-run`)

There is no need to use the `helm-run` entry point, you can instead write a Python wrapper script that calls `helm.benchmark.run.run_benchmark()`. Python will automatically add the directory containing that script to the Python module search path. If your custom classes live in a Python module under that directory, they will automatically be importable by Python. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "There is no need to use the helm-run entry point, you" -> "There is no need to use the helm-run entry point. You"

from typing import List, Optional


import ubelt as ub
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: single newline before and after third-party import:

from typing import List, Optional

import ubelt as ub

from helm.benchmark import model_metadata_registry


If you are using a Python wrapper script that calls `helm.benchmark.run.run_benchmark()` instead of using `helm-run`, Python will automatically add the directory containing that script to the Python module search path. If your custom classes live in a Python module under that directory, they will automatically be importable by Python. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details.
```bash
helm-run --plugins my_package.helm_plugin /path/to/local_plugin.py ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this example confusing. It misleads the user into thinking that the flag requires a module path and a file path, and then maps the file to the module, which is not what it does. I think you should use separate examples for the two different syntaxes i.e.

helm-run --plugins my_package.helm_plugin_a my_package.helm_plugin_b...

and

helm-run --plugins /path/to/local_plugin_a.py /path/to/local_plugin_b.py ...

@Erotemic
Copy link
Contributor Author

Erotemic commented Jan 16, 2026

Thanks for the fixes; this is really great! I think the new approach is much more user-friendly, and the tutorial is very helpful for explaining how to set things up. I have some high level feedback that I'd like to discuss first:

  • Generally, the new documentation tends to conflate making things importable (e.g. scenarios, metrics) vs importing things (e.g. run spec functions).

Yeah, this was a confusion I had when learning how to use HELM. It seems like there are two different plugin systems, and halfway through writing the new docs I realized that it is only the run spec that needs to be pre-imported, and new classes work as long as their qualified names are somehow importable. If I was writing a system I likely would have chosen one or the other: everything is registered via decorator (or metaclass inheritance) or, everything can be specified by fully qualified importable name. I think I'm shoehorning the run-spec into this documentation because I want a single doc to have an answer to the question: "how do I get HELM to see my code?". Maybe there is a more subtle distinction that I'm not appreciating, but it felt like I could lump both of these under the umbrella term "plugin", and give the user a conceptual grounding point before going into the differences between run specs and custom modules.

You could consider moving the PYTHONPATH section earlier in the documentation, ... reuse some text from the old doc.

That makes sense.

  • The tutorial puts the scenario and the run spec function in the same file. I think it would be better to clarify that only the run spec function needs to be imported.

Makes sense. I think there is some value in pre-importing the custom classes. It gives you an early error if you have a syntax problem. But it still could be clarified.

  • With method 1, it should be clearer that project.entry-points.helm only needs to include modules with run spec functions.
  • With method 2, you should specify how the importable module path is determined when using a file path with the argument.
  • Methods 3 requires using the PYTHONPATH method, and method 4 implicitly changes PYTHONPATH; this should be made clearer.

These all make sense.

  • Relatedly, I think there are a few plausible alternatives for naming. I thought that --plugin was vague, and also the documentation never defines what a plugin is. Alternatives: --import-modules is more explicit in explaining what the argument does. --run-spec-module is more explicit about what the user will use the argument for.

What do you think?

I want to argue for keeping plugin.

I could define a "plugin" as any user-specified code that is explicitly or implicitly registered with HELM. (implicit meaning that its qualified path resolves to code that HELM can use). I think plugin is the conceptually the better term here, even though you're right it only does matter for run specs. However, I could envision a future where custom modules could be registered explicitly with decorators (if we wanted to move towards a unified way of handling user code) and there is also a world where run specs could be implicitly specified via qualified paths to the function (I use this method in my pipeline tool kwdagger).

I think plugin has a connotation of: this is my code, this is how I point to it. And even though it's not necessary for the classes, it does pre-ensure they are defined and have a fully qualified name. From your suggestions I would pick --import-modules, but the issue I have with that is it doesn't communicate why you want to import these modules. What is the purpose? To me plugin communicates that you will import something as well as the purpose of bringing external code with new features into the application.

If you're not convinced, I'll change the name, but that's my take on it.

EDIT: GPT Suggested the name --plugin-modules if you like that better. It also suggested adding an alias with --import-modules, but I'm not sure about that idea.

@Erotemic
Copy link
Contributor Author

@yifanmai I believe I've addressed the core concerns.

  • Kept most language from the original docs
  • Moved the importability and PYTHONPATH section earlier.
  • Split classes (importable by name) vs run specs (must be imported).
  • Clarified entry points should list run spec modules.
  • Split --plugins examples into module-name vs file-path, and gave a note import via file path.

@yifanmai
Copy link
Collaborator

Looks good, thanks! Could you run the linter (doc)? Install the dev extra and then run ./pre-commit.sh.


### 4) A Python wrapper script (when you don't want to use `helm-run`)

There is no need to use the `helm-run` entry point. You can instead write a Python wrapper script that calls `helm.benchmark.run.run_benchmark()`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The first sentence is awkward. I would suggest replacing it with something like: "Instead of using the helm-run entry point, you can instead write your own Python wrapper script that calls..."

@yifanmai
Copy link
Collaborator

Will merge this and fix lint errors later, for expediency.

@yifanmai yifanmai merged commit f40beac into stanford-crfm:main Jan 22, 2026
3 of 6 checks passed
@Erotemic
Copy link
Contributor Author

@yifanmai I was just about to fix the issues. Are you taking care of that or do you want me to still run it with a new PR?

@yifanmai
Copy link
Collaborator

@Erotemic no problem, I will handle the issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants