Awesome Harbor

A curated list of awesome projects in the Harbor ecosystem.

Evaluation Benchmarks

terminal-bench-2 - Measures agent ability to complete tasks in a terminal
terminal-bench-pro - Extension of terminal-bench by Alibaba
skillsbench - Measures agent ability to use skills
otel-bench - Measures agent ability to instrument code with OpenTelemetry across multiple languages
CompileBench - Measures agent ability to build a working binary from source
harbor-datasets - Popular benchmarks (e.g. SWE-bench verified) ported to run in Harbor.
RuneBench - Measures agent ability to play RuneScape and complete tasks via TypeScript SDK
legacy-bench - Evaluates agents on maintaining, debugging, and modernizing legacy code in COBOL, Java 7, Fortran, C, and Assembly
SWE-Atlas - Evaluates agents on professional SWE tasks including codebase comprehension and test writing

Training Datasets

SWE-gen-Java - 1000 JVM tasks generated from 16 open-source GitHub repos using SWE-gen
SWE-gen-JS - 1000 JS/TS tasks generated from 30 open-source GitHub repos using SWE-gen
SWE-gen-Rust - 1000 Rust SWE tasks generated using SWE-gen
SWE-gen-Go - 1000 Go SWE tasks generated using SWE-gen
SWE-gen-Cpp - 1000 C++ SWE tasks generated using SWE-gen
Nemotron-Terminal-Synthetic-Tasks - Synthetic terminal tasks by NVIDIA
seta-env - Scaling Environments for Terminal Agents: fully automated Harbor task synthesis and verification

Training & RL

OpenThoughts-Agent - Generating Harbor tasks, distilling trajectories with SFT, and training with SkyRL
endless-terminals - Procedurally generates terminal-use tasks and trains terminal agents with SkyRL
Ares - Framework for online RL training of LLM agents, built on Harbor and SkyRL
SkyRL Harbor Integration - Guide for RL training of agents with SkyRL and Harbor

Tools

harbor-bot - GitHub bot automating QA on Harbor tasks
Benchmark Template - Template for building benchmarks on Harbor with automated QA in CI
SWE-gen - Convert GitHub PRs into Harbor tasks
Oddish - Eval scheduler for running Harbor tasks with provider-aware queuing and automatic retries
TerminalBenchTaskGenerator - Desktop app for chat-driven authoring of Harbor benchmark tasks
AutoAgent - Autonomous agent harness engineering driven by benchmark scores
Meta-Harness - Autonomous improvement of harness code using previous iterations and Harbor evaluations

Contributing

Contributions welcome! Open a PR to add a project you have created or love using.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Harbor

Contents

Evaluation Benchmarks

Training Datasets

Training & RL

Tools

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome Harbor

Contents

Evaluation Benchmarks

Training Datasets

Training & RL

Tools

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages