autoresearch-anycloud

Run Karpathy's autoresearch on your Mac or any cloud GPU — one command, no infrastructure knowledge needed.

Platform	GPU (default)	Cost	Upstream match (H100 80GB)	Status
Mac	Apple Silicon MPS	Free	—	Verified
AWS	A10G 24GB	$1.01/hr	p5.48xlarge (8x H100, ~$98/hr)	Verified
GCP	L4 24GB	~$0.72/hr	a3-highgpu-8g (8x H100, ~$98/hr)	Verified
Azure	A10 24GB	~$3.20/hr	NC40ads_H100_v5 (1x H100, ~$7/hr)	Coming soon
Oracle OCI	A10 24GB	$0.50/hr	BM.GPU.H100.8 (8x H100, ~$44/hr)	Coming soon

Demos

Mac (Apple Silicon MPS) — Free

AWS (A10G 24GB) — $0.12/experiment

GCP (L4 24GB) — $0.11/experiment

Why This Tool?

The autoresearch ecosystem has tools for Mac/MPS, MLX, and cloud orchestrators like SkyPilot. But no single tool lets a researcher go from research intent to results across any platform without infrastructure knowledge. autoresearch-anycloud fills that gap:

Minimal infrastructure setup — provide credentials and a research config; the tool handles infrastructure setup, running of autoresearch, and infrastructure teardown
No babysitting — experiments have timeouts, a budget watchdog aborts if spend exceeds your limit, and try/finally guarantees the cloud VM is torn down whether the run succeeds, fails, or hangs.
Cost tracking and budget enforcement — no tool in the ecosystem tracks spend or enforces budgets
Unified logging and result collection — provision, run, collect, teardown in one command with one log
True multi-platform — same CLI, same workflow, any hardware

Get Running (Mac — 2 minutes)

Install uv (if you don't have it): curl -LsSf https://astral.sh/uv/install.sh | sh
Set your LLM API key (add to ~/.zshrc so it persists):

export ANTHROPIC_API_KEY=sk-ant-...   # if using Claude
export OPENAI_API_KEY=sk-...          # if using OpenAI

Run:

git clone https://github.com/abcdedf/autoresearch-anycloud.git
cd autoresearch-anycloud
uv sync
autoresearch-anycloud init mac
autoresearch-anycloud run

That's it. Training starts immediately on your Mac. No cloud account. No configuration.

Monitor Your Run

Everything streams to your terminal AND to a log file. Open a second terminal to watch:

tail -f logs/run_latest.log

What you'll see:

[12:01:00] Platform:    Mac
[12:01:00] Experiments: 1 (~5 min)
[12:01:00] Est. cost:   $0.018 (GPU: $0.00, API: $0.018)
[12:01:05] [setup] Installing workspace dependencies...
[12:02:30] [prepare] Downloading data and training tokenizer...
[12:03:45] ── Experiment 1/1 ──
[12:03:45] [warmup 1/10] First 10 steps are warmup, training starts after...
[12:04:10] [training 30/60s] step 00050 | loss 3.812 | remaining: 30s
[12:04:50] Evaluating val_bpb...
[12:05:10] val_bpb: 2.124254
[12:05:10]   Cost: $0.02 / $5.00 (GPU: $0.00, API: $0.02)
══════════════════════════════════════════════════
RUN SUMMARY
══════════════════════════════════════════════════
Experiments:  1
Total time:   208s (3.5 min)

 Exp     val_bpb      Time    Status
 ────  ──────────  ────────  ────────
   1    2.124254      208s        ok

Best val_bpb: 2.124254 (experiment 1)
GPU compute:  $0.00 (0.00/hr)
LLM API:      $0.02 (4,000 in + 2,000 out tokens, claude-sonnet)
Total cost:   $0.02 / $5.00 budget
══════════════════════════════════════════════════

Results are saved to ./results/<timestamp>/train.py.

Get Results

ls results/

Each run saves the final train.py (with all improvements the AI made) to a timestamped folder.

Configure Your Research

Edit the included research.yaml:

research:
  topic: "Improve training loss on TinyShakespeare"
  program: "program.md"
  max_experiments: 1       # set very low for quick demo. Upstream default: 100

budget:
  max_cost_usd: 5.00       # Cloud + API combined. Auto-stops if exceeded. For overnight cloud runs: 10-50

The platform is set when you run init — no need to specify it here.

Included defaults are set very low for a quick demo (~5 min). For real research:

max_experiments: 100 — upstream default, runs overnight (12 experiments per hour, 5 min each)
budget: 10–50 — overnight cloud runs cost $5–25 depending on provider
Training time per experiment is 60s (upstream default: 300s).

Run on Cloud GPUs

When you're ready for faster GPUs, run init with a cloud platform and provide credentials. The tool handles everything else — launching the VM, installing dependencies, running training, collecting results, and shutting down the VM.

GPU quota required: All cloud providers limit GPU access by default (quota = 0). Your first run will likely fail with a quota error. Request GPU access before your first run — it's free to apply but approval can take hours to days for new accounts. See GPU Quota below for links.

Note on GPU compatibility: Upstream autoresearch uses FlashAttention 3 and bfloat16, which require Ampere architecture (compute capability 8.0) or newer. This means T4 (Turing, CC 7.5) and V100 (Volta, CC 7.0) GPUs will not work. Compatible cloud GPUs include A10/A10G (Ampere), A100, and H100. Additionally, upstream hardcodes batch sizes for H100 80GB GPUs — this tool patches batch sizes via sed for smaller-VRAM GPUs. We've submitted a PR to upstream to make these values configurable via environment variables.

GPU Architecture Compute Capability Compatible

T4 Turing 7.5 No — no FA3, no bfloat16

V100 Volta 7.0 No — no FA3, no bfloat16

L4 Ada Lovelace 8.9 Yes

A10/A10G Ampere 8.6 Yes

A100 Ampere 8.0 Yes

H100 Hopper 9.0 Yes

AWS (no CLI install needed)

Create an AWS access key and download the CSV
Move the CSV to ~/.aws/credentials/ (create the folder if needed: mkdir -p ~/.aws/credentials)
Run:

autoresearch-anycloud init aws
autoresearch-anycloud run

init aws auto-detects credentials from ~/.aws/credentials/ and verifies them.

A GPU VM launches automatically, trains, collects results, and shuts down. Estimated cloud cost: $0.13 for 1 experiment on an A10G GPU.

GCP (no CLI install needed)

Create a GCP project (or select an existing one from the dropdown at the top of any GCP console page)
Create a service account with Compute Admin access:
- Go to IAM → Service Accounts
- Click + Create Service Account
- Enter a name (e.g. autoresearch) → click Create and Continue
- Click Select a role → type Compute Admin → select it → click Continue → click Done
- (If you skipped the role: go to IAM → Grant Access → paste the service account email → add Compute Admin role → Save)
Download a JSON key for the service account:
- You're back on the Service Accounts list. Click the service account you just created
- Go to the Keys tab → click Add Key → Create new key
- Select JSON → click Create — the key file downloads automatically (you can only download it once)
Move the JSON to ~/.config/gcloud/:

mkdir -p ~/.config/gcloud
mv ~/Downloads/*.json ~/.config/gcloud/

Run:

autoresearch-anycloud init gcp
autoresearch-anycloud run

init gcp auto-detects credentials from ~/.config/gcloud/ and verifies them.

A GPU VM launches automatically, trains, collects results, and shuts down. Estimated cloud cost: $0.12 for 1 experiment on an L4 GPU (on-demand ~$0.72/hr).

Azure

The quickest way is with the Azure CLI (one command creates everything). You can also gather the credentials manually from the Azure Portal — see below.

Option A: Azure CLI (recommended)

Install Azure CLI: brew install azure-cli
Sign in: az login (opens browser)
Create a service principal with Contributor access (replace <subscription-id> with yours from az account show):

az ad sp create-for-rbac --name autoresearch --role Contributor \
  --scopes /subscriptions/<subscription-id>

This outputs appId, password, tenant. Map them into a JSON file.

Option B: Azure Portal (no CLI needed)

Go to App registrations → New registration → name it autoresearch → Register
From the app's Overview page, note the Application (client) ID and Directory (tenant) ID
Click Add a certificate or secret → New client secret → copy the Value (shown only once)
Go to Subscriptions → click your subscription → copy the Subscription ID
Still in the subscription → Access control (IAM) → Add role assignment → Contributor → select your app → Review + assign

Both options: Save credentials at ~/.azure/service-principal.json (mkdir -p ~/.azure):

{
  "tenant_id": "<tenant>",
  "client_id": "<appId>",
  "client_secret": "<password>",
  "subscription_id": "<subscription-id>"
}

Run:

autoresearch-anycloud init azure
autoresearch-anycloud run

init azure auto-detects credentials from ~/.azure/service-principal.json or environment variables and verifies them.

A GPU VM launches automatically, trains, collects results, and shuts down. Estimated cloud cost: $0.53 for 1 experiment on an A10 GPU (on-demand ~$3.20/hr).

Oracle OCI (no CLI install needed)

You'll create a config file with 6 values. Gather them first, then paste into the template at the end.

Field	Where to find it
`user`	Profile icon (top-right) → My profile → OCID under your username
`fingerprint`	Generated automatically in the API key step below
`tenancy`	Profile icon → Tenancy: <name> → OCID on that page
`region`	From your Console URL: `cloud.oracle.com/?region=us-phoenix-1` → copy the value after `region=`
`key_file`	The private key you download in the API key step below
`compartment`	Identity & Security → Compartments → OCID of the compartment to use

Generate an API signing key: My profile → API keys (under Resources in left sidebar) → Add API key → Generate API key pair → Download private key (skip the public key — the *.public.pem file is not needed) → click Add. The fingerprint is shown on the confirmation page.

Save the key and config:

mkdir -p ~/.oci
mv ~/Downloads/<your-downloaded-key>.pem ~/.oci/oci_api_key.pem
chmod 600 ~/.oci/oci_api_key.pem

Create ~/.oci/config with your values:

[DEFAULT]
user=ocid1.user.oc1..xxxxx
fingerprint=aa:bb:cc:...
tenancy=ocid1.tenancy.oc1..xxxxx
region=us-ashburn-1
key_file=~/.oci/oci_api_key.pem
compartment=ocid1.compartment.oc1..xxxxx

Run:

autoresearch-anycloud init oci
autoresearch-anycloud run

init oci auto-detects credentials from ~/.oci/config and verifies them. The compartment OCID can also be provided via the OCI_COMPARTMENT_ID environment variable instead of adding it to the config file.

Estimated cloud cost: $0.08 for 1 experiment on an A10 GPU (on-demand $0.50/hr).

Troubleshooting:

Error	Cause	Fix
`NotAuthorizedOrNotFound` on launch	Missing IAM policy, GPU shape not available in your region, or GPU quota is 0	1. Check Limits → search `VM.GPU.A10` → verify limit > 0 in your region. 2. Ensure your user/group has a policy like `Allow group <group> to manage compute-instances in compartment <compartment>`. 3. Try a different region — GPU shapes are not available in all regions.
`fingerprint malformed`	Fingerprint in `~/.oci/config` has wrong format	Copy the exact fingerprint from My profile → API keys. Format: `aa:bb:cc:dd:...` (colon-separated hex).
`key_file` permissions error	Private key is too open	Run `chmod 600 ~/.oci/oci_api_key.pem`
Preemptible not supported	GPU shapes do not support preemptible on OCI	This tool uses on-demand instances automatically. If you see this error, update to the latest version.

Switching Platforms

Once you've initialized multiple platforms, switch with init:

autoresearch-anycloud init gcp
autoresearch-anycloud run

Or use --platform for a one-off override without changing the active platform:

autoresearch-anycloud run --platform aws

Cost Tracking

Cloud cost and API cost are tracked and reported separately:

Cloud cost: estimated from public on-demand pricing × elapsed time. Sources: AWS/GCP/Azure/OCI pricing pages.
API cost: estimated from token usage per experiment × published per-token rates. Each experiment sends train.py + git log + program.md to the LLM (4,000 input tokens) and gets back modified code (2,000 output tokens). If you have a subscription (Claude Pro, ChatGPT Plus) or free credits, your actual API cost may be $0.

After each experiment:

  Cloud cost: $0.08  |  API cost (est): $0.04  |  Budget: $5.00

Run summary:

Cloud cost:   $0.13 (on-demand rate: $1.01/hr)
API cost:     $0.02 estimated (4,000 in + 2,000 out tokens × claude-sonnet pay-per-token rate)
              Note: API cost may be $0 if you have a subscription or free credits
Total (est):  $0.15 / $5.00 budget

If combined cost hits your budget, the run stops automatically and results are collected.

GPU Quota

Cloud providers set GPU quota to 0 by default. Your first cloud run will fail with a quota error until you request access. Request GPU quota before your first run — it's free to apply but approval takes hours to days for new accounts:

Provider	Where to request
AWS	Service Quotas → EC2 → search "G and VT"
GCP	Quotas → search "NVIDIA L4"
Azure	Quotas → search "NVadsA10v5", request 36 cores
OCI	Limits → search your GPU shape

Security

Credentials never stored in the project — they live in standard locations on your machine (~/.aws/, ~/.config/gcloud/, ~/.azure/, etc.)
API keys reach cloud VMs via SSH environment variables at runtime, never written to disk
VMs are destroyed after each run — no lingering resources

FAQ

Why not just use SkyPilot? SkyPilot is great for general GPU workloads. This is purpose-built for autoresearch — it handles upstream-specific patches (batch sizes, GPU tuning), cost estimation tuned to short experiment runs, and the full experiment lifecycle. Much simpler to set up for this specific use case.

Why not Terraform? These are ephemeral VMs that live for 10-20 minutes. Native SDK calls (boto3, google-cloud-compute) are faster to provision, easier to tear down, and don't require the user to install anything beyond pip packages. This follows the SkyPilot/Ray approach for ephemeral workloads.

Does it work with other training scripts? Not currently — it's specifically for Karpathy's autoresearch. The architecture (provider modules, orchestrator, cost engine) could be generalized, but that's not the goal today.

Why does it need Ampere+ GPUs? Upstream autoresearch uses FlashAttention 3 and bfloat16, which require compute capability 8.0+. T4 and V100 don't work. This is an upstream constraint — we pick the right GPU so you don't have to figure this out.

What's the catch? You need cloud credentials set up (AWS keys, GCP service account, etc.). The tool doesn't provision cloud accounts — just VMs. GPU quota on some providers (Azure, OCI) requires a manual request that can take days.

Contributing

We aim to support AWS, GCP, Azure, and Oracle OCI. Want to add another cloud provider (Lambda Labs, CoreWeave, Paperspace, etc.)? Contributions are welcome:

Add a provider: create a single Python file under autoresearch_ac/providers/ that implements three functions:

def provision(config: dict, log=None) -> dict:
    """Launch a GPU instance. Return a dict with at least:
       instance_id, public_ip, region, key_path
       (plus any IDs needed for teardown)."""

def teardown(instance_info: dict, log=None):
    """Terminate the instance and clean up all resources.
       instance_info is the dict returned by provision()."""

def estimate_cost(config: dict) -> dict:
    """Return {"hourly_rate_usd": float,
               "estimated_hours": float,
               "estimated_cost_usd": float}."""

Then add the provider to the elif chain in orchestrator.py and cli.py. See aws.py or gcp.py for working examples.

Request a provider: open an issue describing the platform and we'll prioritize it.

Acknowledgments

Karpathy's autoresearch — the upstream project
miolini/autoresearch-macos — Mac MPS adaptation
trevin-creator/autoresearch-mlx — MLX adaptation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
autoresearch_ac		autoresearch_ac
docs		docs
platforms/mac		platforms/mac
tests		tests
upstream		upstream
.gitignore		.gitignore
.gitmodules		.gitmodules
BACKLOG.md		BACKLOG.md
CLAUDE.md		CLAUDE.md
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
research.yaml		research.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoresearch-anycloud

Demos

Mac (Apple Silicon MPS) — Free

AWS (A10G 24GB) — $0.12/experiment

GCP (L4 24GB) — $0.11/experiment

Why This Tool?

Get Running (Mac — 2 minutes)

Monitor Your Run

Get Results

Configure Your Research

Run on Cloud GPUs

AWS (no CLI install needed)

GCP (no CLI install needed)

Azure

Oracle OCI (no CLI install needed)

Switching Platforms

Cost Tracking

GPU Quota

Security

FAQ

Contributing

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

GPU	Architecture	Compute Capability	Compatible
T4	Turing	7.5	No — no FA3, no bfloat16
V100	Volta	7.0	No — no FA3, no bfloat16
L4	Ada Lovelace	8.9	Yes
A10/A10G	Ampere	8.6	Yes
A100	Ampere	8.0	Yes
H100	Hopper	9.0	Yes

Folders and files

Latest commit

History

Repository files navigation

autoresearch-anycloud

Demos

Mac (Apple Silicon MPS) — Free

AWS (A10G 24GB) — $0.12/experiment

GCP (L4 24GB) — $0.11/experiment

Why This Tool?

Get Running (Mac — 2 minutes)

Monitor Your Run

Get Results

Configure Your Research

Run on Cloud GPUs

AWS (no CLI install needed)

GCP (no CLI install needed)

Azure

Oracle OCI (no CLI install needed)

Switching Platforms

Cost Tracking

GPU Quota

Security

FAQ

Contributing

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages