Pin v2 dependency versions for reproducibility#14
Pin v2 dependency versions for reproducibility#14conorbronsdon wants to merge 1 commit intorungalileo:mainfrom
Conversation
The v2 requirements.txt had no version constraints at all, meaning results could vary depending on whatever versions happen to install. Add minimum version pins (>=) based on current latest releases to ensure consistent behavior while still allowing compatible updates. Also normalizes package names to use hyphens per PyPI convention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| galileo>=1.49.0 | ||
| langchain>=1.2.0 | ||
| langchain-anthropic>=1.3.0 | ||
| langchain-baseten>=0.1.9 | ||
| langchain-openai>=1.1.0 | ||
| langchain-mistralai>=1.1.0 | ||
| langchain-google-genai>=4.2.0 | ||
| langchain-together>=0.3.0 | ||
| langchain-fireworks>=1.1.0 | ||
| langchain-writer>=0.3.5 | ||
| langchain-deepseek>=1.0.0 | ||
| langchain-aws>=1.4.0 |
There was a problem hiding this comment.
v2 requirements use minimum-bound specifiers (e.g. galileo>=1.49.0) instead of exact pins, so pip can install newer/unbounded versions and installs remain non-deterministic; the PR's reproducibility goal is not met. Can we pin each dependency with == or add a lockfile/constraints file?
Finding type: Logical Bugs | Severity: 🔴 High
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
In v2/requirements.txt around lines 1-19, the dependencies are currently specified with
minimum bounds (e.g., >=) which does not provide reproducible installs. Replace each >=
specifier with an exact pin (==) using the resolved versions from a reproducible lock
step (for example run pip freeze in a controlled environment or use pip-tools to compile
a constraints file) so each line becomes package==<exact_version>. Alternatively, if you
prefer a constraints file, create v2/constraints.txt with exact versions and update
v2/requirements.txt to reference the constraints file (or document how to install using
--constraint), and ensure there are no duplicate or inconsistent package names and that
formatting remains one package per line. Return the updated v2/requirements.txt and the
generated lock/constraints file.
There was a problem hiding this comment.
Intentionally using >= rather than == here. The previous requirements had zero version constraints, so >= is already a big improvement. Exact pins in requirements.txt cause unnecessary dependency conflicts for contributors and downstream users. If the project adopts uv or pip-tools later, a lockfile can layer on top of these minimum bounds for CI reproducibility without constraining end users.
There was a problem hiding this comment.
Understood on preferring minimum bounds with >= here to reduce conflicts and rely on a future lockfile for reproducibility; I’ll save this preference to memory after the PR is merged.
|
No activity for 30 days — this PR will be closed in 5 days unless updated. |
User description
Summary
>=) to all packages inv2/requirements.txtlangchain_anthropic→langchain-anthropic)Context
The v2 requirements had zero version constraints, meaning evaluation results could vary depending on whatever package versions happen to install at the time. This is a reproducibility risk for a benchmarking tool. Minimum pins ensure a known-good baseline while still allowing compatible updates.
Test plan
pip install -r v2/requirements.txtresolves cleanly in a fresh virtualenv🤖 Generated with Claude Code
Generated description
Below is a concise technical summary of the changes proposed in this PR:
Pin the v2 benchmarking dependency list to minimum tested versions to keep evaluations reproducible with the core evaluation and install tooling. Normalize the package names in
v2/requirements.txtto PyPI’s hyphenated conventions while keeping installs flexible above each baseline.Latest Contributors(1)