Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions notes/pr9_update_summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# PR #9 Update Summary

## Current bottleneck

The trusted remote handoff gate is in place, but no execution-ready
provider-supplied packet is present yet.

## Accepted packet final shape

- `pod_identifier_or_display_name`
- `landing_target: /workspace/parameter-golf`
- `workspace_confirmation`
- one concrete route:
- `exact_attach_command`, or
- `host` + `username` + `port`

## Exact next action

Receive one filled packet and judge it against the TPI-010 acceptance gate.

## Absent-state policy

Without an execution-ready packet, keep the label at `continue_sharpening` and
do not start attach, landing verification, or unchanged TPI-004 execution.

## Resume condition

Resume unchanged TPI-004 only after attach succeeds and the fixed first-command
sequence verifies `/workspace/parameter-golf`, dependencies, and GPU access.
37 changes: 37 additions & 0 deletions notes/tpi_010_acceptance_gate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# TPI-010 Acceptance Gate

## Accept only when all of the following are present

- pod identity is concrete
- route is concrete
- landing target confirmation is concrete

## Required fields

- `pod_identifier_or_display_name`
- `landing_target`
- `workspace_confirmation`
- one of:
- `exact_attach_command`
- `host` + `username` + `port`

## Rejection rule

Reject the packet if any one of the three gate dimensions is missing:

- pod identity
- concrete route
- landing target confirmation

## Reject examples

- local SSH keys exist
- generic Runpod instructions
- generic provider dashboard advice
- old Git auth investigation
- old endpoint rediscovery

## Result

If the packet is rejected, keep the label at `continue_sharpening` and stop
without starting attach or execution resume.
23 changes: 23 additions & 0 deletions notes/tpi_010_decision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# TPI-010 Decision

## Current label

`continue_sharpening`

## Branching rule

- packet absent: `continue_sharpening`
- packet accepted and landing verified: `promote_to_execution_resume`
- packet accepted but attach or landing fails: `move_to_failure_ledger`

## Current turn

- accepted packet present: no
- attach started: no
- landing verified: no
- unchanged TPI-004 resumed: no

## Stop rule

If no execution-ready packet is present, stop cleanly. Do not reopen Git auth,
environment selection, provider rediscovery, or monkey model design debate.
21 changes: 21 additions & 0 deletions notes/tpi_010_first_commands.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# TPI-010 First Commands

Run these immediately after an accepted packet is received.

```bash
pwd
ls /workspace
cd /workspace/parameter-golf
git rev-parse --abbrev-ref HEAD
python3 -c "import torch, datasets, sentencepiece; print('deps-ok')"
nvidia-smi
```

Do not run the unchanged TPI-004 baseline or candidate commands until all of
the following are true:

- attach succeeded
- `/workspace/parameter-golf` is reachable
- branch identity is confirmed
- dependency import check succeeded
- GPU visibility is confirmed
31 changes: 31 additions & 0 deletions notes/tpi_010_packet_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# TPI-010 Execution-Ready Handoff Packet

## Objective

Define one provider-supplied packet that is sufficient to begin attach and
landing verification immediately, without reopening Git auth, environment
selection, or monkey model design debates.

## Packet Template

```md
pod_identifier_or_display_name: <required>
landing_target: /workspace/parameter-golf
workspace_confirmation: <provider-supplied confirmation or explicit path statement>
route_type: provider_attach_command | ssh_tuple

if route_type == provider_attach_command:
exact_attach_command: <required>

if route_type == ssh_tuple:
host: <required>
username: <required>
port: <required>
```

## Rules

- The packet must be concrete enough to use in one attach attempt.
- Generic provider instructions are not a valid packet.
- Local SSH key availability is not a valid packet.
- The target workspace must resolve to `/workspace/parameter-golf`.
59 changes: 59 additions & 0 deletions notes/tpi_010_resume_contract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# TPI-010 Resume Contract

## Preconditions

- PR #6 is trusted as the latest TPI-007 absent-handoff surface
- PR #7 is trusted as the TPI-008 recovery surface
- PR #8 is trusted as the TPI-009 handoff gate
- PR #9 is trusted as the TPI-010 execution-ready packet surface
- one accepted execution-ready packet is present

## Verification first

Run this sequence before any baseline or candidate execution:

```bash
pwd
ls /workspace
cd /workspace/parameter-golf
git rev-parse --abbrev-ref HEAD
python3 -c "import torch, datasets, sentencepiece; print('deps-ok')"
nvidia-smi
```

## Resume commands after landing verification

### Baseline

```bash
RUN_ID=tpi004_baseline_stride1024 \
DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/ \
TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \
VOCAB_SIZE=1024 \
TRAIN_SEQ_LEN=1024 \
EVAL_STRIDE=1024 \
MAX_WALLCLOCK_SECONDS=600 \
TRAIN_LOG_EVERY=50 \
VAL_LOSS_EVERY=200 \
torchrun --standalone --nproc_per_node=1 train_gpt.py
```

### Candidate

```bash
RUN_ID=tpi004_candidate_stride128 \
DATA_PATH=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/ \
TOKENIZER_PATH=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \
VOCAB_SIZE=1024 \
TRAIN_SEQ_LEN=1024 \
EVAL_STRIDE=128 \
MAX_WALLCLOCK_SECONDS=600 \
TRAIN_LOG_EVERY=50 \
VAL_LOSS_EVERY=200 \
torchrun --standalone --nproc_per_node=1 train_gpt.py
```

## Gate

Do not resume TPI-004 until landing in `/workspace/parameter-golf` is verified.
Do not redefine the baseline or candidate commands in TPI-010.
27 changes: 27 additions & 0 deletions notes/tpi_010_stub.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# TPI-010 Status Stub

## Current bottleneck

The only lead blocker is the absence of one execution-ready provider-supplied
handoff packet.

## Decision label

`continue_sharpening`

## Exact next action

Receive one packet that includes:

- `pod_identifier_or_display_name`
- `landing_target`
- `workspace_confirmation`
- one concrete route:
- `exact_attach_command`, or
- `host` + `username` + `port`

## Current state

- trusted remote review surfaces are maintained
- no accepted packet is present in this turn
- unchanged TPI-004 remains dormant