Harbor

terminal-bench Public

A benchmark for LLMs on complicated tasks in the terminal

Python 1.9k 501

harbor Public

Harbor is a framework for running agent evaluations and creating and using RL environments.

Python 1.4k 878

terminal-bench-2 Public

Shell 156 58

terminal-bench-3 Public

🚧 Accepting Task Submissions 🚧

Python 108 113

terminal-bench-science Public

Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal

Python 53 33

awesome-harbor Public

A curated list of awesome Harbor ecosystem projects

Provide feedback