Run Karpathy's autoresearch on your Mac or any cloud GPU — one command, no infrastructure knowledge needed.
| Platform | GPU (default) | Cost | Upstream match (H100 80GB) | Status |
|---|---|---|---|---|
| Mac | Apple Silicon MPS | Free | — | Verified |
| AWS | A10G 24GB | $1.01/hr | p5.48xlarge (8x H100, ~$98/hr) | Verified |
| GCP | L4 24GB | ~$0.72/hr | a3-highgpu-8g (8x H100, ~$98/hr) | Verified |
| Azure | A10 24GB | ~$3.20/hr | NC40ads_H100_v5 (1x H100, ~$7/hr) | Coming soon |
| Oracle OCI | A10 24GB | $0.50/hr | BM.GPU.H100.8 (8x H100, ~$44/hr) | Coming soon |
The autoresearch ecosystem has tools for Mac/MPS, MLX, and cloud orchestrators like SkyPilot. But no single tool lets a researcher go from research intent to results across any platform without infrastructure knowledge. autoresearch-anycloud fills that gap:
- Minimal infrastructure setup — provide credentials and a research config; the tool handles infrastructure setup, running of autoresearch, and infrastructure teardown
- No babysitting — experiments have timeouts, a budget watchdog aborts if spend exceeds your limit, and try/finally guarantees the cloud VM is torn down whether the run succeeds, fails, or hangs.
- Cost tracking and budget enforcement — no tool in the ecosystem tracks spend or enforces budgets
- Unified logging and result collection — provision, run, collect, teardown in one command with one log
- True multi-platform — same CLI, same workflow, any hardware
- Install uv (if you don't have it):
curl -LsSf https://astral.sh/uv/install.sh | sh - Set your LLM API key (add to
~/.zshrcso it persists):
export ANTHROPIC_API_KEY=sk-ant-... # if using Claude
export OPENAI_API_KEY=sk-... # if using OpenAI- Run:
git clone https://github.com/abcdedf/autoresearch-anycloud.git
cd autoresearch-anycloud
uv sync
autoresearch-anycloud init mac
autoresearch-anycloud runThat's it. Training starts immediately on your Mac. No cloud account. No configuration.
Everything streams to your terminal AND to a log file. Open a second terminal to watch:
tail -f logs/run_latest.logWhat you'll see:
[12:01:00] Platform: Mac
[12:01:00] Experiments: 1 (~5 min)
[12:01:00] Est. cost: $0.018 (GPU: $0.00, API: $0.018)
[12:01:05] [setup] Installing workspace dependencies...
[12:02:30] [prepare] Downloading data and training tokenizer...
[12:03:45] ── Experiment 1/1 ──
[12:03:45] [warmup 1/10] First 10 steps are warmup, training starts after...
[12:04:10] [training 30/60s] step 00050 | loss 3.812 | remaining: 30s
[12:04:50] Evaluating val_bpb...
[12:05:10] val_bpb: 2.124254
[12:05:10] Cost: $0.02 / $5.00 (GPU: $0.00, API: $0.02)
══════════════════════════════════════════════════
RUN SUMMARY
══════════════════════════════════════════════════
Experiments: 1
Total time: 208s (3.5 min)
Exp val_bpb Time Status
──── ────────── ──────── ────────
1 2.124254 208s ok
Best val_bpb: 2.124254 (experiment 1)
GPU compute: $0.00 (0.00/hr)
LLM API: $0.02 (4,000 in + 2,000 out tokens, claude-sonnet)
Total cost: $0.02 / $5.00 budget
══════════════════════════════════════════════════
Results are saved to ./results/<timestamp>/train.py.
ls results/Each run saves the final train.py (with all improvements the AI made) to a timestamped folder.
Edit the included research.yaml:
research:
topic: "Improve training loss on TinyShakespeare"
program: "program.md"
max_experiments: 1 # set very low for quick demo. Upstream default: 100
budget:
max_cost_usd: 5.00 # Cloud + API combined. Auto-stops if exceeded. For overnight cloud runs: 10-50The platform is set when you run init — no need to specify it here.
Included defaults are set very low for a quick demo (~5 min). For real research:
max_experiments: 100— upstream default, runs overnight (12 experiments per hour, 5 min each)budget: 10–50— overnight cloud runs cost $5–25 depending on provider- Training time per experiment is 60s (upstream default: 300s).
When you're ready for faster GPUs, run init with a cloud platform and provide credentials. The tool handles everything else — launching the VM, installing dependencies, running training, collecting results, and shutting down the VM.
GPU quota required: All cloud providers limit GPU access by default (quota = 0). Your first run will likely fail with a quota error. Request GPU access before your first run — it's free to apply but approval can take hours to days for new accounts. See GPU Quota below for links.
Note on GPU compatibility: Upstream autoresearch uses FlashAttention 3 and bfloat16, which require Ampere architecture (compute capability 8.0) or newer. This means T4 (Turing, CC 7.5) and V100 (Volta, CC 7.0) GPUs will not work. Compatible cloud GPUs include A10/A10G (Ampere), A100, and H100. Additionally, upstream hardcodes batch sizes for H100 80GB GPUs — this tool patches batch sizes via
sedfor smaller-VRAM GPUs. We've submitted a PR to upstream to make these values configurable via environment variables.
GPU Architecture Compute Capability Compatible T4 Turing 7.5 No — no FA3, no bfloat16 V100 Volta 7.0 No — no FA3, no bfloat16 L4 Ada Lovelace 8.9 Yes A10/A10G Ampere 8.6 Yes A100 Ampere 8.0 Yes H100 Hopper 9.0 Yes
- Create an AWS access key and download the CSV
- Move the CSV to
~/.aws/credentials/(create the folder if needed:mkdir -p ~/.aws/credentials) - Run:
autoresearch-anycloud init aws
autoresearch-anycloud runinit aws auto-detects credentials from ~/.aws/credentials/ and verifies them.
A GPU VM launches automatically, trains, collects results, and shuts down. Estimated cloud cost: $0.13 for 1 experiment on an A10G GPU.
- Create a GCP project (or select an existing one from the dropdown at the top of any GCP console page)
- Create a service account with Compute Admin access:
- Go to IAM → Service Accounts
- Click + Create Service Account
- Enter a name (e.g.
autoresearch) → click Create and Continue - Click Select a role → type
Compute Admin→ select it → click Continue → click Done - (If you skipped the role: go to IAM → Grant Access → paste the service account email → add Compute Admin role → Save)
- Download a JSON key for the service account:
- You're back on the Service Accounts list. Click the service account you just created
- Go to the Keys tab → click Add Key → Create new key
- Select JSON → click Create — the key file downloads automatically (you can only download it once)
- Move the JSON to
~/.config/gcloud/:
mkdir -p ~/.config/gcloud
mv ~/Downloads/*.json ~/.config/gcloud/- Run:
autoresearch-anycloud init gcp
autoresearch-anycloud runinit gcp auto-detects credentials from ~/.config/gcloud/ and verifies them.
A GPU VM launches automatically, trains, collects results, and shuts down. Estimated cloud cost: $0.12 for 1 experiment on an L4 GPU (on-demand ~$0.72/hr).
The quickest way is with the Azure CLI (one command creates everything). You can also gather the credentials manually from the Azure Portal — see below.
Option A: Azure CLI (recommended)
- Install Azure CLI:
brew install azure-cli - Sign in:
az login(opens browser) - Create a service principal with Contributor access (replace
<subscription-id>with yours fromaz account show):
az ad sp create-for-rbac --name autoresearch --role Contributor \
--scopes /subscriptions/<subscription-id>This outputs appId, password, tenant. Map them into a JSON file.
Option B: Azure Portal (no CLI needed)
- Go to App registrations → New registration → name it
autoresearch→ Register - From the app's Overview page, note the Application (client) ID and Directory (tenant) ID
- Click Add a certificate or secret → New client secret → copy the Value (shown only once)
- Go to Subscriptions → click your subscription → copy the Subscription ID
- Still in the subscription → Access control (IAM) → Add role assignment → Contributor → select your app → Review + assign
Both options: Save credentials at ~/.azure/service-principal.json (mkdir -p ~/.azure):
{
"tenant_id": "<tenant>",
"client_id": "<appId>",
"client_secret": "<password>",
"subscription_id": "<subscription-id>"
}- Run:
autoresearch-anycloud init azure
autoresearch-anycloud runinit azure auto-detects credentials from ~/.azure/service-principal.json or environment variables and verifies them.
A GPU VM launches automatically, trains, collects results, and shuts down. Estimated cloud cost: $0.53 for 1 experiment on an A10 GPU (on-demand ~$3.20/hr).
You'll create a config file with 6 values. Gather them first, then paste into the template at the end.
| Field | Where to find it |
|---|---|
user |
Profile icon (top-right) → My profile → OCID under your username |
fingerprint |
Generated automatically in the API key step below |
tenancy |
Profile icon → Tenancy: <name> → OCID on that page |
region |
From your Console URL: cloud.oracle.com/?region=us-phoenix-1 → copy the value after region= |
key_file |
The private key you download in the API key step below |
compartment |
Identity & Security → Compartments → OCID of the compartment to use |
Generate an API signing key: My profile → API keys (under Resources in left sidebar) → Add API key → Generate API key pair → Download private key (skip the public key — the *.public.pem file is not needed) → click Add. The fingerprint is shown on the confirmation page.
Save the key and config:
mkdir -p ~/.oci
mv ~/Downloads/<your-downloaded-key>.pem ~/.oci/oci_api_key.pem
chmod 600 ~/.oci/oci_api_key.pemCreate ~/.oci/config with your values:
[DEFAULT]
user=ocid1.user.oc1..xxxxx
fingerprint=aa:bb:cc:...
tenancy=ocid1.tenancy.oc1..xxxxx
region=us-ashburn-1
key_file=~/.oci/oci_api_key.pem
compartment=ocid1.compartment.oc1..xxxxxRun:
autoresearch-anycloud init oci
autoresearch-anycloud runinit oci auto-detects credentials from ~/.oci/config and verifies them. The compartment OCID can also be provided via the OCI_COMPARTMENT_ID environment variable instead of adding it to the config file.
Estimated cloud cost: $0.08 for 1 experiment on an A10 GPU (on-demand $0.50/hr).
Troubleshooting:
| Error | Cause | Fix |
|---|---|---|
NotAuthorizedOrNotFound on launch |
Missing IAM policy, GPU shape not available in your region, or GPU quota is 0 | 1. Check Limits → search VM.GPU.A10 → verify limit > 0 in your region. 2. Ensure your user/group has a policy like Allow group <group> to manage compute-instances in compartment <compartment>. 3. Try a different region — GPU shapes are not available in all regions. |
fingerprint malformed |
Fingerprint in ~/.oci/config has wrong format |
Copy the exact fingerprint from My profile → API keys. Format: aa:bb:cc:dd:... (colon-separated hex). |
key_file permissions error |
Private key is too open | Run chmod 600 ~/.oci/oci_api_key.pem |
| Preemptible not supported | GPU shapes do not support preemptible on OCI | This tool uses on-demand instances automatically. If you see this error, update to the latest version. |
Once you've initialized multiple platforms, switch with init:
autoresearch-anycloud init gcp
autoresearch-anycloud runOr use --platform for a one-off override without changing the active platform:
autoresearch-anycloud run --platform awsCloud cost and API cost are tracked and reported separately:
- Cloud cost: estimated from public on-demand pricing × elapsed time. Sources: AWS/GCP/Azure/OCI pricing pages.
- API cost: estimated from token usage per experiment × published per-token rates. Each experiment sends train.py + git log + program.md to the LLM (4,000 input tokens) and gets back modified code (2,000 output tokens). If you have a subscription (Claude Pro, ChatGPT Plus) or free credits, your actual API cost may be $0.
After each experiment:
Cloud cost: $0.08 | API cost (est): $0.04 | Budget: $5.00
Run summary:
Cloud cost: $0.13 (on-demand rate: $1.01/hr)
API cost: $0.02 estimated (4,000 in + 2,000 out tokens × claude-sonnet pay-per-token rate)
Note: API cost may be $0 if you have a subscription or free credits
Total (est): $0.15 / $5.00 budget
If combined cost hits your budget, the run stops automatically and results are collected.
Cloud providers set GPU quota to 0 by default. Your first cloud run will fail with a quota error until you request access. Request GPU quota before your first run — it's free to apply but approval takes hours to days for new accounts:
| Provider | Where to request |
|---|---|
| AWS | Service Quotas → EC2 → search "G and VT" |
| GCP | Quotas → search "NVIDIA L4" |
| Azure | Quotas → search "NVadsA10v5", request 36 cores |
| OCI | Limits → search your GPU shape |
- Credentials never stored in the project — they live in standard locations on your machine (
~/.aws/,~/.config/gcloud/,~/.azure/, etc.) - API keys reach cloud VMs via SSH environment variables at runtime, never written to disk
- VMs are destroyed after each run — no lingering resources
Why not just use SkyPilot? SkyPilot is great for general GPU workloads. This is purpose-built for autoresearch — it handles upstream-specific patches (batch sizes, GPU tuning), cost estimation tuned to short experiment runs, and the full experiment lifecycle. Much simpler to set up for this specific use case.
Why not Terraform? These are ephemeral VMs that live for 10-20 minutes. Native SDK calls (boto3, google-cloud-compute) are faster to provision, easier to tear down, and don't require the user to install anything beyond pip packages. This follows the SkyPilot/Ray approach for ephemeral workloads.
Does it work with other training scripts? Not currently — it's specifically for Karpathy's autoresearch. The architecture (provider modules, orchestrator, cost engine) could be generalized, but that's not the goal today.
Why does it need Ampere+ GPUs? Upstream autoresearch uses FlashAttention 3 and bfloat16, which require compute capability 8.0+. T4 and V100 don't work. This is an upstream constraint — we pick the right GPU so you don't have to figure this out.
What's the catch? You need cloud credentials set up (AWS keys, GCP service account, etc.). The tool doesn't provision cloud accounts — just VMs. GPU quota on some providers (Azure, OCI) requires a manual request that can take days.
We aim to support AWS, GCP, Azure, and Oracle OCI. Want to add another cloud provider (Lambda Labs, CoreWeave, Paperspace, etc.)? Contributions are welcome:
-
Add a provider: create a single Python file under
autoresearch_ac/providers/that implements three functions:def provision(config: dict, log=None) -> dict: """Launch a GPU instance. Return a dict with at least: instance_id, public_ip, region, key_path (plus any IDs needed for teardown).""" def teardown(instance_info: dict, log=None): """Terminate the instance and clean up all resources. instance_info is the dict returned by provision().""" def estimate_cost(config: dict) -> dict: """Return {"hourly_rate_usd": float, "estimated_hours": float, "estimated_cost_usd": float}."""
Then add the provider to the
elifchain inorchestrator.pyandcli.py. Seeaws.pyorgcp.pyfor working examples. -
Request a provider: open an issue describing the platform and we'll prioritize it.
- Karpathy's autoresearch — the upstream project
- miolini/autoresearch-macos — Mac MPS adaptation
- trevin-creator/autoresearch-mlx — MLX adaptation
MIT


