Skip to content

Conversation

@Erotemic
Copy link
Contributor

@Erotemic Erotemic commented Dec 5, 2025

I've been experimenting with using coding agents, and something that can help is adding a top level https://agents.md file, which provides the agent with high level context about the repo, so it can more quickly plan out how to go about whatever the user request was.

To create this AGENTS.md file, I used GPT codex and prompted it to do a deep dive into the repo and write the resulting file. I manually checked the result and made some small modifications.

Codex output:

Summary

  • add a repository-level AGENTS guide with environment, structure, testing, and extension notes for future agents

Codex Task

# AGENT Instructions

## Development environment
- Use **Python 3.10+**. Create and activate a virtual environment via `uv`, `virtualenv`, Conda, or `pyenv` before installing dependencies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I would recommend that new developers use uv, but I haven't gotten around to updating the documentation on ReadTheDocs about this yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This is mostly about helping an AI agent get into the right development environment. Although AGENT files do contain a lot of good info for human developers too. Probably makes sense to ditch anything non-uv at this point.

## Operational notes
- Default local config path is `./prod_env/`; override with `--local-path` when running commands. Ensure required provider credentials are configured before executing model-dependent runs.
- Many scenarios/model clients download datasets or call external APIs; prefer running without `-m models`/`-m scenarios` in CI to avoid costs and failures. Use markers deliberately to target specific expensive suites.
- Static leaderboard assets reside under `src/helm/benchmark/static` and `static_build`; React frontend is an alternative UI and not the deployed default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The React frontend is the default UI; static_build is a compiled version that is provided for the convenience of Python users who do not have Node installed. There might be still a few references in the code and documentation about it being an "alternative", because there used to be a separate legacy default frontend that has since been deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful to know. An issue with agents right now is they will hone in on incorrect documentation and assume it is true. (Although if you tell them something is incorrect, they typically are good at respecting that).

@yifanmai
Copy link
Collaborator

While this is interesting, I'm not interested in merging this because I am the currently the sole main maintainer, and I am not making significant use of agents in this codebase. Additionally, I am planning to transition HELM to maintenance mode later this year (I have not publicly announced this yet), after which there should be minimal large changes to the codebase.

Relatedly, if you're interested in forking HELM to support your use cases, I would be open to that and happy to chat more.

@yifanmai yifanmai closed this Jan 22, 2026
@Erotemic
Copy link
Contributor Author

I recommend playing around with agents - maybe not here, but just in general. They're at the point where they are useful a decent percent of the time. They still need a lot of hand holding, but the ability to find the right context in a repo is pretty nice. I'll often spin up a few to do busy work tasks while I focus on something else, and when I come back some of them failed, but a few of them returned a decent result. Having an AGENTS.md file can make them significantly more efficient as it gives them high level context. For big repos like this one it makes a big difference.

Understood about stepping back from active development. It's a lot of work, and I'm grateful for the time you've spent to help me get up to speed with this repo. The code here does a lot, and that can be a double edge sword. Forking might be an option, but I'm also hesitant to pick up maintainership of another large repo (I already take care of 30+ packages with a few of them needing active care). Still it may the best path forward for the MAGNET project. I'll let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants