gb250e · gb250e · Mar 21, 2026 · Mar 21, 2026 · Mar 21, 2026 · Mar 21, 2026
diff --git a/notes/pr2_update_summary.md b/notes/pr2_update_summary.md
@@ -0,0 +1,21 @@
+# PR #2 Update Summary
+
+## Chosen environment path
+
+- `runpod_parameter_golf`
+
+## Why chosen
+
+- It is the most concrete runnable path already documented in the repo.
+- It reduces ambiguity around dependencies, dataset placement, and GPU shape.
+
+## What remains before the evidence run
+
+- create or access the Runpod environment
+- clone the repo and check out `exp/eval-first-003`
+- download the published `sp1024` assets
+- execute the fixed baseline and candidate commands
+
+## Review state
+
+- review comments: none observed during this turn
diff --git a/notes/tpi_003_environment_decision.md b/notes/tpi_003_environment_decision.md
@@ -0,0 +1,71 @@
+# TPI-003 Environment Decision
+
+## Chosen environment
+
+- `runpod_parameter_golf`
+
+## Why chosen
+
+- The repository README already defines this path concretely.
+- It is the smallest runnable path with clear dependency assumptions.
+- It is closer to challenge conditions than an unspecified remote machine.
+- It keeps the same monkey-model eval-first branch and only changes environment readiness.
+
+## Rejected alternatives
+
+### `local_repair`
+
+- Rejected as primary path because three blockers stack at once:
+  - `torch` missing
+  - dataset/tokenizer assets missing
+  - GPU access blocked
+
+### `remote_gpu_small`
+
+- Rejected as primary path because it is less specific than the Runpod route.
+- It risks wasting time on ad hoc package, path, and logging setup that the documented Runpod path already solves more cleanly.
+
+## Required assets
+
+- repo checkout at `exp/eval-first-003`
+- Python environment with `torch`, `datasets`, `sentencepiece`
+- `/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/`
+- `/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model`
+- writable `logs/` and `runs/TPI-003/`
+
+## Baseline env vars
+
+- `DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/`
+- `TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model`
+- `VOCAB_SIZE=1024`
+- `TRAIN_SEQ_LEN=1024`
+- `EVAL_STRIDE=1024`
+- `MAX_WALLCLOCK_SECONDS=600`
+- `TRAIN_LOG_EVERY=50`
+- `VAL_LOSS_EVERY=200`
+
+## Candidate env vars
+
+- same as baseline except `EVAL_STRIDE=128`
+- optional second candidate: `EVAL_STRIDE=64`
+
+## Command skeleton
+
+```bash
+RUN_ID=<run_id> \
+DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/ \
+TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \
+VOCAB_SIZE=1024 \
+TRAIN_SEQ_LEN=1024 \
+EVAL_STRIDE=<stride> \
+MAX_WALLCLOCK_SECONDS=600 \
+TRAIN_LOG_EVERY=50 \
+VAL_LOSS_EVERY=200 \
+torchrun --standalone --nproc_per_node=1 train_gpt.py
+```
+
+## First command to run next turn
+
+```bash
+python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 1
+```
diff --git a/notes/tpi_003_environment_plan.md b/notes/tpi_003_environment_plan.md
@@ -0,0 +1,44 @@
+# TPI-003 Environment Plan
+
+## Objective
+
+Select the smallest runnable environment path that can produce one real baseline/candidate evidence pair for the existing eval-first monkey-model policy.
+
+## Public-facing name
+
+`MonkeyModel_EvalFirst_MinRunnableEnv`
+
+## Candidate environment paths
+
+1. `local_repair`
+   - install missing dependencies locally
+   - acquire published dataset/tokenizer assets locally
+   - verify GPU/runtime availability locally
+
+2. `remote_gpu_small`
+   - use a minimal remote CUDA environment
+   - run one baseline/candidate pair with the same public-safe branch
+
+3. `runpod_parameter_golf`
+   - use the challenge-aligned Runpod path
+   - fetch the published dataset/tokenizer assets there
+   - execute one baseline/candidate pair under a more official environment shape
+
+## Selection criteria
+
+- shortest path to one real evidence pair
+- command simplicity
+- reproducibility
+- fit with current monkey-model eval-first branch
+- lowest setup overhead that still yields runtime + val_bpb
+
+## Current recommendation
+
+Prefer `runpod_parameter_golf` unless an already-usable remote CUDA environment exists. The local path is currently the weakest candidate because torch, assets, and GPU availability are all blocked at once.
+
+## Selection outcome for this turn
+
+- Chosen path: `runpod_parameter_golf`
+- Reason: it is the most explicit public-safe path already documented in the repo, with the least ambiguity about dependency readiness and challenge-compatible execution shape.
+- Deferred path: `remote_gpu_small`
+- Rejected primary path: `local_repair`
diff --git a/notes/tpi_003_execution_contract.md b/notes/tpi_003_execution_contract.md
@@ -0,0 +1,116 @@
+# TPI-003 Execution Contract
+
+## Objective
+
+Fix one executable baseline/candidate command contract for the existing eval-first monkey-model branch.
+
+## Baseline contract
+
+- branch: `exp/eval-first-003`
+- mode: non-sliding validation behavior
+- effective setting: `EVAL_STRIDE=TRAIN_SEQ_LEN`
+- chosen environment: `runpod_parameter_golf`
+- target host shape for first pass: `1xH100` Runpod pod
+- tokenizer path: `/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model`
+- dataset path: `/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/`
+- logs:
+  - script-native log: `logs/${RUN_ID}.txt`
+  - turn note archive: `runs/TPI-003/`
+- commit SHA capture:
+  - `git rev-parse HEAD > runs/TPI-003/<run_id>.commit.txt`
+
+## Candidate contract
+
+- branch: `exp/eval-first-003`
+- mode: eval-first sliding validation
+- primary candidate: `EVAL_STRIDE=128`
+- optional secondary candidate: `EVAL_STRIDE=64`
+
+## Baseline command
+
+```bash
+cd /workspace
+git clone https://github.com/gb250e/parameter-golf.git
+cd parameter-golf
+git checkout exp/eval-first-003
+python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 1
+mkdir -p runs/TPI-003
+git rev-parse HEAD > runs/TPI-003/tpi003_baseline.commit.txt
+RUN_ID=tpi003_baseline_stride1024 \
+DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/ \
+TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \
+VOCAB_SIZE=1024 \
+TRAIN_SEQ_LEN=1024 \
+EVAL_STRIDE=1024 \
+MAX_WALLCLOCK_SECONDS=600 \
+TRAIN_LOG_EVERY=50 \
+VAL_LOSS_EVERY=200 \
+torchrun --standalone --nproc_per_node=1 train_gpt.py | tee runs/TPI-003/tpi003_baseline.stdout.log
+```
+
+## Candidate command
+
+```bash
+cd /workspace/parameter-golf
+git checkout exp/eval-first-003
+mkdir -p runs/TPI-003
+git rev-parse HEAD > runs/TPI-003/tpi003_candidate_128.commit.txt
+RUN_ID=tpi003_candidate_stride128 \
+DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/ \
+TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \
+VOCAB_SIZE=1024 \
+TRAIN_SEQ_LEN=1024 \
+EVAL_STRIDE=128 \
+MAX_WALLCLOCK_SECONDS=600 \
+TRAIN_LOG_EVERY=50 \
+VAL_LOSS_EVERY=200 \
+torchrun --standalone --nproc_per_node=1 train_gpt.py | tee runs/TPI-003/tpi003_candidate_128.stdout.log
+```
+
+## Required env vars for both runs
+
+- `DATA_PATH`
+- `TOKENIZER_PATH`
+- `VOCAB_SIZE=1024`
+- `TRAIN_SEQ_LEN=1024`
+- `EVAL_STRIDE`
+- `MAX_WALLCLOCK_SECONDS=600`
+- `TRAIN_LOG_EVERY=50`
+- `VAL_LOSS_EVERY=200`
+
+## GPU assumption
+
+- first runnable path assumes `1xH100`
+- this is for evidence collection, not final record-track timing
+
+## Log policy
+
+- keep terminal stdout in `runs/TPI-003/*.stdout.log`
+- keep script-native logs in `logs/`
+- summarize runtime and `val_bpb` back into notes after the run
+
+## Next-turn first command
+
+```bash
+python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 1
+```
+
+## Minimum required assets
+
+- Python environment with `torch`, `datasets`, and `sentencepiece`
+- accessible CUDA runtime for `train_gpt.py`
+- published FineWeb cached shards or equivalent challenge-provided dataset path
+- published tokenizer model path
+
+## Minimum required capture
+
+- commit SHA
+- command line
+- runtime notes
+- whether eval path was reached
+- final runtime summary
+- final val_bpb summary
+
+## Environment decision rule
+
+Choose the environment path that can produce one real baseline/candidate pair with the least additional setup while remaining reproducible and public-safe.