LiveWeb Arena

Real-time web evaluation framework for LLM browser agents.

Quick Start

# Install
pip install -e .
playwright install chromium

# Configure
cp .env.example .env
# Edit .env with your API_KEY

# Run evaluation
python eval.py --seed 42 --verbose

Usage

# Basic evaluation
python eval.py --seed 42

# Specific template
python eval.py --templates weather/current_weather --seed 42

# Multi-task evaluation
python eval.py --num-tasks 3 --seed 42

# Deterministic task by ID
python eval.py --task-id 100001 --seed 42

# View all templates
python eval.py --show-registry

Options

Option	Description	Default
`--seed`	Random seed	random
`--task-id`	Deterministic task ID	-
`--num-tasks`	Sub-tasks (1-4)	1
`--templates`	Template(s) to use	random
`--model`	LLM model	`zai-org/GLM-4.7-TEE`
`--base-url`	API URL	`https://llm.chutes.ai/v1`
`--timeout`	Timeout (seconds)	3600
`--verbose`	Verbose output	false

Templates

Weather (wttr.in) - 6 templates

location_name, time_of_day, multi_day, current_weather, astronomy, weather_comparison

Stooq (stooq.com) - 7 templates

stooq_price, stooq_comparison, stooq_ranking, stooq_sector_analysis, stooq_currency, stooq_volatility, stooq_range_position

CoinGecko (coingecko.com) - 8 templates

coingecko_price, coingecko_volume, coingecko_comparison, coingecko_rank, coingecko_top_movers, coingecko_supply, coingecko_ath, coingecko_performance

Taostats (taostats.io) - 10 templates

taostats_subnet_info, taostats_comparison, taostats_analysis, taostats_ranking, taostats_price_change, taostats_threshold, taostats_multi_condition, taostats_delta, taostats_range_count, taostats_percentage

Hybrid (cross-source) - 3 templates

hybrid_top_performer, hybrid_ranking, hybrid_conditional_branch

Environment Variables

Variable	Description
`API_KEY`	LLM API key (required)
`COINGECKO_API_KEY`	CoinGecko Pro API key (optional)
`TAOSTATS_API_KEY`	Taostats API key (optional)

Output

Results saved to eval/<timestamp>.json:

{
  "score": 1.0,
  "success": true,
  "extra": {
    "seed": 42,
    "answer_details": [...],
    "conversation": [...]
  }
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
liveweb_arena		liveweb_arena
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
TASK_TOPOLOGY.md		TASK_TOPOLOGY.md
env.py		env.py
eval.py		eval.py
infer.py		infer.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiveWeb Arena

Quick Start

Usage

Options

Templates

Weather (wttr.in) - 6 templates

Stooq (stooq.com) - 7 templates

CoinGecko (coingecko.com) - 8 templates

Taostats (taostats.io) - 10 templates

Hybrid (cross-source) - 3 templates

Environment Variables

Output

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LiveWeb Arena

Quick Start

Usage

Options

Templates

Weather (wttr.in) - 6 templates

Stooq (stooq.com) - 7 templates

CoinGecko (coingecko.com) - 8 templates

Taostats (taostats.io) - 10 templates

Hybrid (cross-source) - 3 templates

Environment Variables

Output

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages