From f1ac5cf2146264f733d1764fb0a067a8f96acd7f Mon Sep 17 00:00:00 2001
From: gb250e <71205769+gb250e@users.noreply.github.com>
Date: Fri, 20 Mar 2026 22:12:57 -0700
Subject: [PATCH 1/4] docs: add TPI-003 environment plan

---
 notes/tpi_003_environment_plan.md | 37 +++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)
 create mode 100644 notes/tpi_003_environment_plan.md

diff --git a/notes/tpi_003_environment_plan.md b/notes/tpi_003_environment_plan.md
new file mode 100644
index 0000000000..ce3c1559de
--- /dev/null
+++ b/notes/tpi_003_environment_plan.md
@@ -0,0 +1,37 @@
+# TPI-003 Environment Plan
+
+## Objective
+
+Select the smallest runnable environment path that can produce one real baseline/candidate evidence pair for the existing eval-first monkey-model policy.
+
+## Public-facing name
+
+`MonkeyModel_EvalFirst_MinRunnableEnv`
+
+## Candidate environment paths
+
+1. `local_repair`
+   - install missing dependencies locally
+   - acquire published dataset/tokenizer assets locally
+   - verify GPU/runtime availability locally
+
+2. `remote_gpu_small`
+   - use a minimal remote CUDA environment
+   - run one baseline/candidate pair with the same public-safe branch
+
+3. `runpod_parameter_golf`
+   - use the challenge-aligned Runpod path
+   - fetch the published dataset/tokenizer assets there
+   - execute one baseline/candidate pair under a more official environment shape
+
+## Selection criteria
+
+- shortest path to one real evidence pair
+- command simplicity
+- reproducibility
+- fit with current monkey-model eval-first branch
+- lowest setup overhead that still yields runtime + val_bpb
+
+## Current recommendation
+
+Prefer `runpod_parameter_golf` unless an already-usable remote CUDA environment exists. The local path is currently the weakest candidate because torch, assets, and GPU availability are all blocked at once.

From bc7fdc55a1764bdf583ca972b8904b6b7047e80f Mon Sep 17 00:00:00 2001
From: gb250e <71205769+gb250e@users.noreply.github.com>
Date: Fri, 20 Mar 2026 22:13:28 -0700
Subject: [PATCH 2/4] docs: add TPI-003 execution contract

---
 notes/tpi_003_execution_contract.md | 38 +++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 notes/tpi_003_execution_contract.md

diff --git a/notes/tpi_003_execution_contract.md b/notes/tpi_003_execution_contract.md
new file mode 100644
index 0000000000..d4e853387d
--- /dev/null
+++ b/notes/tpi_003_execution_contract.md
@@ -0,0 +1,38 @@
+# TPI-003 Execution Contract
+
+## Objective
+
+Fix one executable baseline/candidate command contract for the existing eval-first monkey-model branch.
+
+## Baseline contract
+
+- branch: `exp/eval-first-003`
+- mode: non-sliding validation behavior
+- effective setting: `EVAL_STRIDE=TRAIN_SEQ_LEN`
+
+## Candidate contract
+
+- branch: `exp/eval-first-003`
+- mode: eval-first sliding validation
+- primary candidate: `EVAL_STRIDE=128`
+- optional secondary candidate: `EVAL_STRIDE=64`
+
+## Minimum required assets
+
+- Python environment with `torch`, `datasets`, and `sentencepiece`
+- accessible CUDA runtime for `train_gpt.py`
+- published FineWeb cached shards or equivalent challenge-provided dataset path
+- published tokenizer model path
+
+## Minimum required capture
+
+- commit SHA
+- command line
+- runtime notes
+- whether eval path was reached
+- final runtime summary
+- final val_bpb summary
+
+## Environment decision rule
+
+Choose the environment path that can produce one real baseline/candidate pair with the least additional setup while remaining reproducible and public-safe.

From af2ce3dc6b9fdebacc2efa1837b6754bee2617e2 Mon Sep 17 00:00:00 2001
From: eb24516 <eb24516@gmail.com>
Date: Sat, 21 Mar 2026 14:18:29 +0900
Subject: [PATCH 3/4] docs: fix TPI-003 environment contract

---
 notes/tpi_003_environment_decision.md | 71 ++++++++++++++++++++++++
 notes/tpi_003_environment_plan.md     |  7 +++
 notes/tpi_003_execution_contract.md   | 78 +++++++++++++++++++++++++++
 3 files changed, 156 insertions(+)
 create mode 100644 notes/tpi_003_environment_decision.md

diff --git a/notes/tpi_003_environment_decision.md b/notes/tpi_003_environment_decision.md
new file mode 100644
index 0000000000..fca3f82cd2
--- /dev/null
+++ b/notes/tpi_003_environment_decision.md
@@ -0,0 +1,71 @@
+# TPI-003 Environment Decision
+
+## Chosen environment
+
+- `runpod_parameter_golf`
+
+## Why chosen
+
+- The repository README already defines this path concretely.
+- It is the smallest runnable path with clear dependency assumptions.
+- It is closer to challenge conditions than an unspecified remote machine.
+- It keeps the same monkey-model eval-first branch and only changes environment readiness.
+
+## Rejected alternatives
+
+### `local_repair`
+
+- Rejected as primary path because three blockers stack at once:
+  - `torch` missing
+  - dataset/tokenizer assets missing
+  - GPU access blocked
+
+### `remote_gpu_small`
+
+- Rejected as primary path because it is less specific than the Runpod route.
+- It risks wasting time on ad hoc package, path, and logging setup that the documented Runpod path already solves more cleanly.
+
+## Required assets
+
+- repo checkout at `exp/eval-first-003`
+- Python environment with `torch`, `datasets`, `sentencepiece`
+- `/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/`
+- `/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model`
+- writable `logs/` and `runs/TPI-003/`
+
+## Baseline env vars
+
+- `DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/`
+- `TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model`
+- `VOCAB_SIZE=1024`
+- `TRAIN_SEQ_LEN=1024`
+- `EVAL_STRIDE=1024`
+- `MAX_WALLCLOCK_SECONDS=600`
+- `TRAIN_LOG_EVERY=50`
+- `VAL_LOSS_EVERY=200`
+
+## Candidate env vars
+
+- same as baseline except `EVAL_STRIDE=128`
+- optional second candidate: `EVAL_STRIDE=64`
+
+## Command skeleton
+
+```bash
+RUN_ID=<run_id> \
+DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/ \
+TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \
+VOCAB_SIZE=1024 \
+TRAIN_SEQ_LEN=1024 \
+EVAL_STRIDE=<stride> \
+MAX_WALLCLOCK_SECONDS=600 \
+TRAIN_LOG_EVERY=50 \
+VAL_LOSS_EVERY=200 \
+torchrun --standalone --nproc_per_node=1 train_gpt.py
+```
+
+## First command to run next turn
+
+```bash
+python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 1
+```
diff --git a/notes/tpi_003_environment_plan.md b/notes/tpi_003_environment_plan.md
index ce3c1559de..ea6290d850 100644
--- a/notes/tpi_003_environment_plan.md
+++ b/notes/tpi_003_environment_plan.md
@@ -35,3 +35,10 @@ Select the smallest runnable environment path that can produce one real baseline
 ## Current recommendation
 
 Prefer `runpod_parameter_golf` unless an already-usable remote CUDA environment exists. The local path is currently the weakest candidate because torch, assets, and GPU availability are all blocked at once.
+
+## Selection outcome for this turn
+
+- Chosen path: `runpod_parameter_golf`
+- Reason: it is the most explicit public-safe path already documented in the repo, with the least ambiguity about dependency readiness and challenge-compatible execution shape.
+- Deferred path: `remote_gpu_small`
+- Rejected primary path: `local_repair`
diff --git a/notes/tpi_003_execution_contract.md b/notes/tpi_003_execution_contract.md
index d4e853387d..81db30b9f3 100644
--- a/notes/tpi_003_execution_contract.md
+++ b/notes/tpi_003_execution_contract.md
@@ -9,6 +9,15 @@ Fix one executable baseline/candidate command contract for the existing eval-fir
 - branch: `exp/eval-first-003`
 - mode: non-sliding validation behavior
 - effective setting: `EVAL_STRIDE=TRAIN_SEQ_LEN`
+- chosen environment: `runpod_parameter_golf`
+- target host shape for first pass: `1xH100` Runpod pod
+- tokenizer path: `/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model`
+- dataset path: `/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/`
+- logs:
+  - script-native log: `logs/${RUN_ID}.txt`
+  - turn note archive: `runs/TPI-003/`
+- commit SHA capture:
+  - `git rev-parse HEAD > runs/TPI-003/<run_id>.commit.txt`
 
 ## Candidate contract
 
@@ -17,6 +26,75 @@ Fix one executable baseline/candidate command contract for the existing eval-fir
 - primary candidate: `EVAL_STRIDE=128`
 - optional secondary candidate: `EVAL_STRIDE=64`
 
+## Baseline command
+
+```bash
+cd /workspace
+git clone https://github.com/gb250e/parameter-golf.git
+cd parameter-golf
+git checkout exp/eval-first-003
+python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 1
+mkdir -p runs/TPI-003
+git rev-parse HEAD > runs/TPI-003/tpi003_baseline.commit.txt
+RUN_ID=tpi003_baseline_stride1024 \
+DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/ \
+TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \
+VOCAB_SIZE=1024 \
+TRAIN_SEQ_LEN=1024 \
+EVAL_STRIDE=1024 \
+MAX_WALLCLOCK_SECONDS=600 \
+TRAIN_LOG_EVERY=50 \
+VAL_LOSS_EVERY=200 \
+torchrun --standalone --nproc_per_node=1 train_gpt.py | tee runs/TPI-003/tpi003_baseline.stdout.log
+```
+
+## Candidate command
+
+```bash
+cd /workspace/parameter-golf
+git checkout exp/eval-first-003
+mkdir -p runs/TPI-003
+git rev-parse HEAD > runs/TPI-003/tpi003_candidate_128.commit.txt
+RUN_ID=tpi003_candidate_stride128 \
+DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/ \
+TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \
+VOCAB_SIZE=1024 \
+TRAIN_SEQ_LEN=1024 \
+EVAL_STRIDE=128 \
+MAX_WALLCLOCK_SECONDS=600 \
+TRAIN_LOG_EVERY=50 \
+VAL_LOSS_EVERY=200 \
+torchrun --standalone --nproc_per_node=1 train_gpt.py | tee runs/TPI-003/tpi003_candidate_128.stdout.log
+```
+
+## Required env vars for both runs
+
+- `DATA_PATH`
+- `TOKENIZER_PATH`
+- `VOCAB_SIZE=1024`
+- `TRAIN_SEQ_LEN=1024`
+- `EVAL_STRIDE`
+- `MAX_WALLCLOCK_SECONDS=600`
+- `TRAIN_LOG_EVERY=50`
+- `VAL_LOSS_EVERY=200`
+
+## GPU assumption
+
+- first runnable path assumes `1xH100`
+- this is for evidence collection, not final record-track timing
+
+## Log policy
+
+- keep terminal stdout in `runs/TPI-003/*.stdout.log`
+- keep script-native logs in `logs/`
+- summarize runtime and `val_bpb` back into notes after the run
+
+## Next-turn first command
+
+```bash
+python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 1
+```
+
 ## Minimum required assets
 
 - Python environment with `torch`, `datasets`, and `sentencepiece`

From 409b3491753aa7ec035b4cadd139861c30684027 Mon Sep 17 00:00:00 2001
From: eb24516 <eb24516@gmail.com>
Date: Sat, 21 Mar 2026 14:18:35 +0900
Subject: [PATCH 4/4] docs: summarize PR #2 runnable path selection

---
 notes/pr2_update_summary.md | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
 create mode 100644 notes/pr2_update_summary.md

diff --git a/notes/pr2_update_summary.md b/notes/pr2_update_summary.md
new file mode 100644
index 0000000000..4203893d1f
--- /dev/null
+++ b/notes/pr2_update_summary.md
@@ -0,0 +1,21 @@
+# PR #2 Update Summary
+
+## Chosen environment path
+
+- `runpod_parameter_golf`
+
+## Why chosen
+
+- It is the most concrete runnable path already documented in the repo.
+- It reduces ambiguity around dependencies, dataset placement, and GPU shape.
+
+## What remains before the evidence run
+
+- create or access the Runpod environment
+- clone the repo and check out `exp/eval-first-003`
+- download the published `sp1024` assets
+- execute the fixed baseline and candidate commands
+
+## Review state
+
+- review comments: none observed during this turn