This document describes the current live Chronohorn stack.
Chronohorn now has a real Rust runtime stack instead of a loose collection of bridge probes.
The current layers are:
- runtime infrastructure
- data + checkpoint boundary
- causal checkpoint replay
- offline artifact builders
- runtime checks
- fleet launch surface
Parallel runtime setup lives in crates/chronohorn-core/src/runtime.rs.
It owns:
- global Rayon pool setup
- default thread-width selection
CHRONOHORN_THREADS- runtime reporting through
print-parallel-runtime
This is process-wide infrastructure, not experiment-local plumbing.
These files define the honest I/O boundary:
- crates/chronohorn-core/src/data.rs
- crates/chronohorn-core/src/checkpoint.rs
- crates/chronohorn-core/src/bridge.rs
- crates/chronohorn-runtime/src/loader.rs
Current local @fineweb layout:
- train:
100,000,000tokens - val:
62,021,846tokens
That means artifact build and held-out eval are already separated at the data-root level.
Important detail:
- root replay code now consumes the typed export-bundle loader in
chronohorn-runtime - manifest, learned-state index, checksum, and sidecar reference resolution are no longer implemented twice
The promoted family namespace is crates/chronohorn-causal-bank/src/lib.rs.
The current replay implementation lives in crates/chronohorn-causal-bank/src/checkpoint.rs.
It owns:
- checkpoint loading
- recurrent causal-bank state updates for the current implementation line
- batched readout
- checkpoint audit entrypoints
Important detail:
- the recurrent state update is still sequential
- the heavy readout path is block-batched
- dense batch math uses hardware BLAS on macOS via Accelerate
This is now the main causal scorer, not a side experiment.
The promoted artifact-builder namespace is crates/chronohorn-causal-bank/src/lib.rs.
The current artifact path lives in crates/chronohorn-causal-bank/src/ngram_bulk.rs.
It owns:
- frozen raw fingerprinted tables
- oracle-edited row selection
- prebuilt artifact serialization
- bulk scalar eval over held-out val
The intended split is:
- build artifact from train tokens
- save artifact
- reuse artifact across checkpoint evals
This keeps long train-side scans out of the eval loop.
The internal runtime-check battery remains in crates/chronohorn-core/src/audit.rs.
These checks are still the gate inside Chronohorn:
- normalization
- repeatability
- future-suffix invariance
- answer-mask invariance
- prefix-truncation parity
- stream-rechunk parity
- sample-set invariance
- gold-logprob consistency
Bulk scalar research scorers are allowed for measurement, but promoted runtime claims still have to survive this path.
Boundary note:
Chronohornowns internal execution invariants and replay parity- external audit and evidence packaging are intentionally out of scope here
- legacy
audit-*names remain for compatibility with the current CLI surface
Mixed-machine execution is documented in FLEET.md.
The backend model is:
cpufor remote Rust / artifact workmetalfor local MLX descendant jobscudafor remote GPU container work
The key point is that this is one manifest and multiple honest backends, not a fake homogeneous cluster.
The currently promoted causal-bank loop is:
- cheap
10kO(n) architecture ablations - scale/context-survival rows for winners
- deeper frontier or replication follow-up only after the cheap lanes separate
That policy is emitted from the family-owned scan regimes under
python/chronohorn/families/causal_bank/scan.py,
including breakthrough-10k, toward-one, toward-one-next, and gated-retention.
Checkpoint runtime:
cargo run -p chronohorn -- \
run-causal-bank-checkpoint <checkpoint-path|bundle-dir> <summary.json> @fineweb [val_tokens]Checkpoint audit:
cargo run -p chronohorn -- \
audit-causal-bank-checkpoint <checkpoint-path|bundle-dir> <summary.json> @fineweb [val_tokens] [chunk_size] [max_chunks]Export bundle replay probe:
cargo run -p chronohorn-cli -- \
probe-causal-bank-export-bundle <export-root>Artifact build:
cargo run -p chronohorn -- \
build-causal-bank-ngram-oracle-budgeted-table @fineweb <artifact.bin> [train_tokens] [report_every] [profile] [oracle_stride]Artifact eval:
cargo run -p chronohorn -- \
run-causal-bank-ngram-bulk-from-table <checkpoint-path|bundle-dir> <summary.json> @fineweb <artifact.bin> [val_tokens] [report_every]Runtime report:
cargo run -p chronohorn -- print-parallel-runtimeOlder bridge families now sit behind src/archive/, and they are no longer the default reading of the repo.
Use ARCHIVE.md for those lines.