diff --git a/notes/pr8_update_summary.md b/notes/pr8_update_summary.md new file mode 100644 index 0000000000..b84e8249f9 --- /dev/null +++ b/notes/pr8_update_summary.md @@ -0,0 +1,31 @@ +# PR #8 Update Summary + +## What this PR adds + +- a trusted-remote handoff resume gate after TPI-008 recovery +- a stricter intake rule for concrete provider-supplied handoff data +- an unchanged execution-resume contract for TPI-004 once landing is verified +- a decision surface that prevents reopening Git auth work without fresh evidence + +## What still remains + +- a concrete provider-supplied attach or SSH route +- verified landing in `/workspace/parameter-golf` +- actual resume of the unchanged TPI-004 baseline/candidate evidence pass + +## First required external handoff + +The next turn should begin from the exact provider-supplied attach route or SSH tuple and immediately verify landing in `/workspace/parameter-golf`. + +## Boundary carried forward + +- remote review surfaces are now treated as trustworthy again +- Git auth recovery is not reopened unless a new concrete regression appears +- monkey model framing remains the only public-facing concept + +## Current turn result + +- the trusted remote handoff gate remains valid +- no concrete provider-supplied handoff package is present in this turn +- the branch therefore stays in an absent-state wait +- current label remains `continue_sharpening` diff --git a/notes/tpi_009_resume_contract.md b/notes/tpi_009_resume_contract.md new file mode 100644 index 0000000000..16d55ad0b2 --- /dev/null +++ b/notes/tpi_009_resume_contract.md @@ -0,0 +1,65 @@ +# TPI-009 Resume Contract + +## Objective + +Define the first commands to run once a concrete provider-supplied handoff package is accepted on top of the restored remote review surfaces. + +## Preconditions + +- PR #6 is trusted as the latest TPI-007 absent-handoff surface +- PR #7 is trusted as the TPI-008 recovery surface +- PR #8 is trusted as the current TPI-009 handoff gate +- the next turn contains a concrete attach route or exact SSH tuple + +## First verification commands + +```bash +pwd +ls /workspace +cd /workspace/parameter-golf +git rev-parse --abbrev-ref HEAD +python3 -c "import torch, datasets, sentencepiece; print('deps-ok')" +nvidia-smi +``` + +## Resume commands after landing verification + +### Baseline +```bash +RUN_ID=tpi004_baseline_stride1024 \ +DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/ \ +TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \ +VOCAB_SIZE=1024 \ +TRAIN_SEQ_LEN=1024 \ +EVAL_STRIDE=1024 \ +MAX_WALLCLOCK_SECONDS=600 \ +TRAIN_LOG_EVERY=50 \ +VAL_LOSS_EVERY=200 \ +torchrun --standalone --nproc_per_node=1 train_gpt.py +``` + +### Candidate +```bash +RUN_ID=tpi004_candidate_stride128 \ +DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/ \ +TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \ +VOCAB_SIZE=1024 \ +TRAIN_SEQ_LEN=1024 \ +EVAL_STRIDE=128 \ +MAX_WALLCLOCK_SECONDS=600 \ +TRAIN_LOG_EVERY=50 \ +VAL_LOSS_EVERY=200 \ +torchrun --standalone --nproc_per_node=1 train_gpt.py +``` + +## Gate + +Do not reopen Git auth or provider rediscovery after a valid handoff package is supplied. Attach and verify landing immediately. + +Do not run the baseline or candidate command before: + +- `pwd` and `ls /workspace` succeed +- `/workspace/parameter-golf` is reachable +- branch identity is confirmed +- dependency check succeeds +- `nvidia-smi` confirms the expected runtime diff --git a/notes/tpi_009_resume_decision.md b/notes/tpi_009_resume_decision.md new file mode 100644 index 0000000000..c21a463f38 --- /dev/null +++ b/notes/tpi_009_resume_decision.md @@ -0,0 +1,64 @@ +# TPI-009 Resume Decision + +## Status + +waiting_external_handoff + +## Objective + +Record whether the trusted remote review surfaces can now be used to resume the unchanged TPI-004 evidence pass. + +## Required decision fields + +- remote review surface trusted or not: yes +- concrete provider handoff package received or not: not yet +- landing path verified or not: not yet +- TPI-004 resume ready or not: not yet + +## Classification + +- `promote_to_execution_resume` +- `continue_sharpening` +- `move_to_failure_ledger` + +## Accepted branch states + +- `accepted` + - concrete provider handoff package is present + - landing is verified + - promote immediately to `promote_to_execution_resume` +- `absent` + - no concrete provider handoff package is present + - stop cleanly + - do not reopen rediscovery + - do not start execution +- `failure-after-attach` + - a concrete provider handoff package is present + - attach or landing is actually attempted and fails + - move to `move_to_failure_ledger` + +## Current reading + +- TPI-008 removed the push/auth blocker. +- The current blocker is now external provider handoff only. +- No new attach route is included in this PR itself. + +## Classification result for the branch state + +- `continue_sharpening` + +## Absent-state handling + +- stop cleanly at the trusted remote handoff gate +- no Git auth reopening +- no environment reselection +- no model-side changes +- no unchanged TPI-004 execution until landing is verified + +## Promotion condition + +Promote to `promote_to_execution_resume` immediately once the handoff package is concrete enough to attach and verify `/workspace/parameter-golf`. + +## Failure condition + +Move to `move_to_failure_ledger` only if a newly attempted concrete handoff route fails after trust in the remote review surface has already been restored. diff --git a/notes/tpi_009_trusted_remote_handoff_gate.md b/notes/tpi_009_trusted_remote_handoff_gate.md new file mode 100644 index 0000000000..3caa5c4962 --- /dev/null +++ b/notes/tpi_009_trusted_remote_handoff_gate.md @@ -0,0 +1,56 @@ +# TPI-009 Trusted Remote Handoff Gate + +## Objective + +Resume the suspended TPI-004 evidence pass only from a trusted remote review surface and a concrete provider-supplied handoff package. + +## Public-facing name + +`MonkeyModel_TrustedRemoteHandoff` + +## Starting assumptions + +- TPI-008 restored remote reflection and Git auth for normal HTTPS push +- PR #6 is again a trustworthy surface for the latest TPI-007 absent-handoff state +- no model-side or tokenizer-side change is needed before attach and landing verification + +## Accepted handoff package + +Any acceptable handoff in this loop must include at least one of the following: + +1. exact provider-supplied attach command, or +2. exact SSH tuple with host, username, and port + +and must also include: + +- pod identifier or display name +- expected landing path or explicit confirmation that `/workspace/parameter-golf` is the target workspace + +This minimum package is mandatory. Partial metadata does not advance the loop. + +## Rejection rule + +Reject all of the following as insufficient for progress in this loop: + +- repeating that local SSH keys exist +- generic Runpod instructions without the concrete current pod route +- generic provider dashboard advice without the concrete current pod route +- reopening Git auth inspection without a fresh concrete regression +- reopening old endpoint rediscovery without new concrete provider data +- reopening environment selection or model changes before landing verification + +## Immediate next action after acceptance + +1. run `pwd` +2. run `ls /workspace` +3. `cd /workspace/parameter-golf` +4. run `git rev-parse --abbrev-ref HEAD` +5. run `python3 -c "import torch, datasets, sentencepiece; print('deps-ok')"` +6. run `nvidia-smi` +7. only after landing and dependency verification, resume the unchanged TPI-004 baseline/candidate evidence pass + +## Current branch reading + +- trusted remote surfaces are restored +- no concrete provider-supplied handoff package is present in this turn +- current label remains `continue_sharpening`