Skip to content

abcdedf/autoresearch-anycloud

Repository files navigation

autoresearch-anycloud

Run Karpathy's autoresearch on your Mac or any cloud GPU — one command, no infrastructure knowledge needed.

Platform GPU (default) Cost Upstream match (H100 80GB) Status
Mac Apple Silicon MPS Free Verified
AWS A10G 24GB $1.01/hr p5.48xlarge (8x H100, ~$98/hr) Verified
GCP L4 24GB ~$0.72/hr a3-highgpu-8g (8x H100, ~$98/hr) Verified
Azure A10 24GB ~$3.20/hr NC40ads_H100_v5 (1x H100, ~$7/hr) Coming soon
Oracle OCI A10 24GB $0.50/hr BM.GPU.H100.8 (8x H100, ~$44/hr) Coming soon

Demos

Mac (Apple Silicon MPS) — Free

Mac demo

AWS (A10G 24GB) — $0.12/experiment

AWS demo

GCP (L4 24GB) — $0.11/experiment

GCP demo

Why This Tool?

The autoresearch ecosystem has tools for Mac/MPS, MLX, and cloud orchestrators like SkyPilot. But no single tool lets a researcher go from research intent to results across any platform without infrastructure knowledge. autoresearch-anycloud fills that gap:

  • Minimal infrastructure setup — provide credentials and a research config; the tool handles infrastructure setup, running of autoresearch, and infrastructure teardown
  • No babysitting — experiments have timeouts, a budget watchdog aborts if spend exceeds your limit, and try/finally guarantees the cloud VM is torn down whether the run succeeds, fails, or hangs.
  • Cost tracking and budget enforcement — no tool in the ecosystem tracks spend or enforces budgets
  • Unified logging and result collection — provision, run, collect, teardown in one command with one log
  • True multi-platform — same CLI, same workflow, any hardware

Get Running (Mac — 2 minutes)

  1. Install uv (if you don't have it): curl -LsSf https://astral.sh/uv/install.sh | sh
  2. Set your LLM API key (add to ~/.zshrc so it persists):
export ANTHROPIC_API_KEY=sk-ant-...   # if using Claude
export OPENAI_API_KEY=sk-...          # if using OpenAI
  1. Run:
git clone https://github.com/abcdedf/autoresearch-anycloud.git
cd autoresearch-anycloud
uv sync
autoresearch-anycloud init mac
autoresearch-anycloud run

That's it. Training starts immediately on your Mac. No cloud account. No configuration.

Monitor Your Run

Everything streams to your terminal AND to a log file. Open a second terminal to watch:

tail -f logs/run_latest.log

What you'll see:

[12:01:00] Platform:    Mac
[12:01:00] Experiments: 1 (~5 min)
[12:01:00] Est. cost:   $0.018 (GPU: $0.00, API: $0.018)
[12:01:05] [setup] Installing workspace dependencies...
[12:02:30] [prepare] Downloading data and training tokenizer...
[12:03:45] ── Experiment 1/1 ──
[12:03:45] [warmup 1/10] First 10 steps are warmup, training starts after...
[12:04:10] [training 30/60s] step 00050 | loss 3.812 | remaining: 30s
[12:04:50] Evaluating val_bpb...
[12:05:10] val_bpb: 2.124254
[12:05:10]   Cost: $0.02 / $5.00 (GPU: $0.00, API: $0.02)
══════════════════════════════════════════════════
RUN SUMMARY
══════════════════════════════════════════════════
Experiments:  1
Total time:   208s (3.5 min)

 Exp     val_bpb      Time    Status
 ────  ──────────  ────────  ────────
   1    2.124254      208s        ok

Best val_bpb: 2.124254 (experiment 1)
GPU compute:  $0.00 (0.00/hr)
LLM API:      $0.02 (4,000 in + 2,000 out tokens, claude-sonnet)
Total cost:   $0.02 / $5.00 budget
══════════════════════════════════════════════════

Results are saved to ./results/<timestamp>/train.py.

Get Results

ls results/

Each run saves the final train.py (with all improvements the AI made) to a timestamped folder.

Configure Your Research

Edit the included research.yaml:

research:
  topic: "Improve training loss on TinyShakespeare"
  program: "program.md"
  max_experiments: 1       # set very low for quick demo. Upstream default: 100

budget:
  max_cost_usd: 5.00       # Cloud + API combined. Auto-stops if exceeded. For overnight cloud runs: 10-50

The platform is set when you run init — no need to specify it here.

Included defaults are set very low for a quick demo (~5 min). For real research:

  • max_experiments: 100 — upstream default, runs overnight (12 experiments per hour, 5 min each)
  • budget: 10–50 — overnight cloud runs cost $5–25 depending on provider
  • Training time per experiment is 60s (upstream default: 300s).

Run on Cloud GPUs

When you're ready for faster GPUs, run init with a cloud platform and provide credentials. The tool handles everything else — launching the VM, installing dependencies, running training, collecting results, and shutting down the VM.

GPU quota required: All cloud providers limit GPU access by default (quota = 0). Your first run will likely fail with a quota error. Request GPU access before your first run — it's free to apply but approval can take hours to days for new accounts. See GPU Quota below for links.

Note on GPU compatibility: Upstream autoresearch uses FlashAttention 3 and bfloat16, which require Ampere architecture (compute capability 8.0) or newer. This means T4 (Turing, CC 7.5) and V100 (Volta, CC 7.0) GPUs will not work. Compatible cloud GPUs include A10/A10G (Ampere), A100, and H100. Additionally, upstream hardcodes batch sizes for H100 80GB GPUs — this tool patches batch sizes via sed for smaller-VRAM GPUs. We've submitted a PR to upstream to make these values configurable via environment variables.

GPU Architecture Compute Capability Compatible
T4 Turing 7.5 No — no FA3, no bfloat16
V100 Volta 7.0 No — no FA3, no bfloat16
L4 Ada Lovelace 8.9 Yes
A10/A10G Ampere 8.6 Yes
A100 Ampere 8.0 Yes
H100 Hopper 9.0 Yes

AWS (no CLI install needed)

  1. Create an AWS access key and download the CSV
  2. Move the CSV to ~/.aws/credentials/ (create the folder if needed: mkdir -p ~/.aws/credentials)
  3. Run:
autoresearch-anycloud init aws
autoresearch-anycloud run

init aws auto-detects credentials from ~/.aws/credentials/ and verifies them.

A GPU VM launches automatically, trains, collects results, and shuts down. Estimated cloud cost: $0.13 for 1 experiment on an A10G GPU.

GCP (no CLI install needed)

  1. Create a GCP project (or select an existing one from the dropdown at the top of any GCP console page)
  2. Create a service account with Compute Admin access:
    • Go to IAM → Service Accounts
    • Click + Create Service Account
    • Enter a name (e.g. autoresearch) → click Create and Continue
    • Click Select a role → type Compute Admin → select it → click Continue → click Done
    • (If you skipped the role: go to IAMGrant Access → paste the service account email → add Compute Admin role → Save)
  3. Download a JSON key for the service account:
    • You're back on the Service Accounts list. Click the service account you just created
    • Go to the Keys tab → click Add KeyCreate new key
    • Select JSON → click Create — the key file downloads automatically (you can only download it once)
  4. Move the JSON to ~/.config/gcloud/:
mkdir -p ~/.config/gcloud
mv ~/Downloads/*.json ~/.config/gcloud/
  1. Run:
autoresearch-anycloud init gcp
autoresearch-anycloud run

init gcp auto-detects credentials from ~/.config/gcloud/ and verifies them.

A GPU VM launches automatically, trains, collects results, and shuts down. Estimated cloud cost: $0.12 for 1 experiment on an L4 GPU (on-demand ~$0.72/hr).

Azure

The quickest way is with the Azure CLI (one command creates everything). You can also gather the credentials manually from the Azure Portal — see below.

Option A: Azure CLI (recommended)

  1. Install Azure CLI: brew install azure-cli
  2. Sign in: az login (opens browser)
  3. Create a service principal with Contributor access (replace <subscription-id> with yours from az account show):
az ad sp create-for-rbac --name autoresearch --role Contributor \
  --scopes /subscriptions/<subscription-id>

This outputs appId, password, tenant. Map them into a JSON file.

Option B: Azure Portal (no CLI needed)

  1. Go to App registrationsNew registration → name it autoresearchRegister
  2. From the app's Overview page, note the Application (client) ID and Directory (tenant) ID
  3. Click Add a certificate or secretNew client secret → copy the Value (shown only once)
  4. Go to Subscriptions → click your subscription → copy the Subscription ID
  5. Still in the subscription → Access control (IAM)Add role assignmentContributor → select your app → Review + assign

Both options: Save credentials at ~/.azure/service-principal.json (mkdir -p ~/.azure):

{
  "tenant_id": "<tenant>",
  "client_id": "<appId>",
  "client_secret": "<password>",
  "subscription_id": "<subscription-id>"
}
  1. Run:
autoresearch-anycloud init azure
autoresearch-anycloud run

init azure auto-detects credentials from ~/.azure/service-principal.json or environment variables and verifies them.

A GPU VM launches automatically, trains, collects results, and shuts down. Estimated cloud cost: $0.53 for 1 experiment on an A10 GPU (on-demand ~$3.20/hr).

Oracle OCI (no CLI install needed)

You'll create a config file with 6 values. Gather them first, then paste into the template at the end.

Field Where to find it
user Profile icon (top-right) → My profile → OCID under your username
fingerprint Generated automatically in the API key step below
tenancy Profile icon → Tenancy: <name> → OCID on that page
region From your Console URL: cloud.oracle.com/?region=us-phoenix-1 → copy the value after region=
key_file The private key you download in the API key step below
compartment Identity & Security → Compartments → OCID of the compartment to use

Generate an API signing key: My profile → API keys (under Resources in left sidebar) → Add API keyGenerate API key pairDownload private key (skip the public key — the *.public.pem file is not needed) → click Add. The fingerprint is shown on the confirmation page.

Save the key and config:

mkdir -p ~/.oci
mv ~/Downloads/<your-downloaded-key>.pem ~/.oci/oci_api_key.pem
chmod 600 ~/.oci/oci_api_key.pem

Create ~/.oci/config with your values:

[DEFAULT]
user=ocid1.user.oc1..xxxxx
fingerprint=aa:bb:cc:...
tenancy=ocid1.tenancy.oc1..xxxxx
region=us-ashburn-1
key_file=~/.oci/oci_api_key.pem
compartment=ocid1.compartment.oc1..xxxxx

Run:

autoresearch-anycloud init oci
autoresearch-anycloud run

init oci auto-detects credentials from ~/.oci/config and verifies them. The compartment OCID can also be provided via the OCI_COMPARTMENT_ID environment variable instead of adding it to the config file.

Estimated cloud cost: $0.08 for 1 experiment on an A10 GPU (on-demand $0.50/hr).

Troubleshooting:

Error Cause Fix
NotAuthorizedOrNotFound on launch Missing IAM policy, GPU shape not available in your region, or GPU quota is 0 1. Check Limits → search VM.GPU.A10 → verify limit > 0 in your region. 2. Ensure your user/group has a policy like Allow group <group> to manage compute-instances in compartment <compartment>. 3. Try a different region — GPU shapes are not available in all regions.
fingerprint malformed Fingerprint in ~/.oci/config has wrong format Copy the exact fingerprint from My profile → API keys. Format: aa:bb:cc:dd:... (colon-separated hex).
key_file permissions error Private key is too open Run chmod 600 ~/.oci/oci_api_key.pem
Preemptible not supported GPU shapes do not support preemptible on OCI This tool uses on-demand instances automatically. If you see this error, update to the latest version.

Switching Platforms

Once you've initialized multiple platforms, switch with init:

autoresearch-anycloud init gcp
autoresearch-anycloud run

Or use --platform for a one-off override without changing the active platform:

autoresearch-anycloud run --platform aws

Cost Tracking

Cloud cost and API cost are tracked and reported separately:

  • Cloud cost: estimated from public on-demand pricing × elapsed time. Sources: AWS/GCP/Azure/OCI pricing pages.
  • API cost: estimated from token usage per experiment × published per-token rates. Each experiment sends train.py + git log + program.md to the LLM (4,000 input tokens) and gets back modified code (2,000 output tokens). If you have a subscription (Claude Pro, ChatGPT Plus) or free credits, your actual API cost may be $0.

After each experiment:

  Cloud cost: $0.08  |  API cost (est): $0.04  |  Budget: $5.00

Run summary:

Cloud cost:   $0.13 (on-demand rate: $1.01/hr)
API cost:     $0.02 estimated (4,000 in + 2,000 out tokens × claude-sonnet pay-per-token rate)
              Note: API cost may be $0 if you have a subscription or free credits
Total (est):  $0.15 / $5.00 budget

If combined cost hits your budget, the run stops automatically and results are collected.

GPU Quota

Cloud providers set GPU quota to 0 by default. Your first cloud run will fail with a quota error until you request access. Request GPU quota before your first run — it's free to apply but approval takes hours to days for new accounts:

Provider Where to request
AWS Service Quotas → EC2 → search "G and VT"
GCP Quotas → search "NVIDIA L4"
Azure Quotas → search "NVadsA10v5", request 36 cores
OCI Limits → search your GPU shape

Security

  • Credentials never stored in the project — they live in standard locations on your machine (~/.aws/, ~/.config/gcloud/, ~/.azure/, etc.)
  • API keys reach cloud VMs via SSH environment variables at runtime, never written to disk
  • VMs are destroyed after each run — no lingering resources

FAQ

Why not just use SkyPilot? SkyPilot is great for general GPU workloads. This is purpose-built for autoresearch — it handles upstream-specific patches (batch sizes, GPU tuning), cost estimation tuned to short experiment runs, and the full experiment lifecycle. Much simpler to set up for this specific use case.

Why not Terraform? These are ephemeral VMs that live for 10-20 minutes. Native SDK calls (boto3, google-cloud-compute) are faster to provision, easier to tear down, and don't require the user to install anything beyond pip packages. This follows the SkyPilot/Ray approach for ephemeral workloads.

Does it work with other training scripts? Not currently — it's specifically for Karpathy's autoresearch. The architecture (provider modules, orchestrator, cost engine) could be generalized, but that's not the goal today.

Why does it need Ampere+ GPUs? Upstream autoresearch uses FlashAttention 3 and bfloat16, which require compute capability 8.0+. T4 and V100 don't work. This is an upstream constraint — we pick the right GPU so you don't have to figure this out.

What's the catch? You need cloud credentials set up (AWS keys, GCP service account, etc.). The tool doesn't provision cloud accounts — just VMs. GPU quota on some providers (Azure, OCI) requires a manual request that can take days.

Contributing

We aim to support AWS, GCP, Azure, and Oracle OCI. Want to add another cloud provider (Lambda Labs, CoreWeave, Paperspace, etc.)? Contributions are welcome:

  • Add a provider: create a single Python file under autoresearch_ac/providers/ that implements three functions:

    def provision(config: dict, log=None) -> dict:
        """Launch a GPU instance. Return a dict with at least:
           instance_id, public_ip, region, key_path
           (plus any IDs needed for teardown)."""
    
    def teardown(instance_info: dict, log=None):
        """Terminate the instance and clean up all resources.
           instance_info is the dict returned by provision()."""
    
    def estimate_cost(config: dict) -> dict:
        """Return {"hourly_rate_usd": float,
                   "estimated_hours": float,
                   "estimated_cost_usd": float}."""

    Then add the provider to the elif chain in orchestrator.py and cli.py. See aws.py or gcp.py for working examples.

  • Request a provider: open an issue describing the platform and we'll prioritize it.

Acknowledgments

License

MIT

About

Run autoresearch on your Mac or any cloud GPU easily.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages