Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions records/track_10min_16mb/2026-04-06_Scylla_GPTQ_BH3072/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Scylla + GPTQ + BH3072 — val_bpb 1.0856 (3-seed mean)

**val_bpb = 1.0856** (3-seed mean) | 15.3-15.8 MB | 8xH100 SXM | No SLOT, No TTT

## 3-Seed Results

| Seed | Sliding BPB | Artifact |
|------|------------|----------|
| 1337 | 1.1009 | 15,267,156 |
| 42 | **1.0782** | 15,813,568 |
| 2024 | **1.0777** | 15,807,116 |
| **Mean** | **1.0856** | |
Comment on lines +7 to +12
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Markdown table in the “3-Seed Results” section uses double leading pipes (||) on each row, which doesn’t render as a standard GitHub table. Replace with single | delimiters so the table formats correctly.

Copilot uses AI. Check for mistakes.

Beats merged SOTA (1.1147, PR #1019) by 0.029 BPB (14x significance threshold).

## Key Techniques

- **Scylla tokenizer** (998-vocab TokenMonster, PR #1143 @simon-marcus): 37% fewer tokens per byte vs SentencePiece 1024
- **AR self-gen Full Hessian GPTQ** (int6, Cholesky error compensation): 64 self-generated sequences for calibration
- **BigramHash 3072x112**: matching #1019's configuration
- **Architecture**: 11L 512d 8H/4KV GQA, LeakyReLU(0.5)^2 MLP 3x, VRL, VE128, XSA all 11 layers, QK-Gain 4.0, Partial RoPE 16/64, LN Scale, SmearGate, U-Net skips, EMA(0.997) + SWA, Late QAT, LZMA-9, FA3

## Compliance

- No SLOT (no eval-time delta optimization)
- No TTT (no eval-time weight updates)
- No n-gram cache
- No network calls
- Tokenizer byte accounting via validated metadata (candidate.meta.npz)
- All artifacts under 16MB, all training under 600s

## Reproduction

```bash
VOCAB_SIZE=998 BIGRAM_VOCAB_SIZE=3072 BIGRAM_DIM=112 WARMDOWN_ITERS=4000 \
DATA_PATH=./data/datasets/fineweb10B_scylla \
TOKENIZER_PATH=./candidate.vocab TOKENIZER_META_PATH=./candidate.meta.npz \
SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py
```
Comment on lines +23 to +39
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README states “No SLOT, No TTT” and “No n-gram cache”, but the provided reproduction command does not set TTT_ENABLED=0 / OGD_ENABLED=0 (and the script defaults both to enabled). Update the command (or defaults) so that a copy/paste run matches the claimed compliance.

Copilot uses AI. Check for mistakes.

Requires Scylla-retokenized FineWeb shards (see anthonym21/fineweb10B-scylla on HuggingFace).

## Credits

- Scylla tokenizer: @simon-marcus (PR #1143)
- Training stack lineage: PR #175 (@anthony-maio), PR #1019 (@abaybektursun)
- GPTQ: PR #1019 (@abaybektursun)
- VRL: ResFormer (arXiv:2410.17897)
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"name": "Scylla_GPTQ_BH3072",
"author": "Anthony Maio",
"github_id": "anthony-maio",
"date": "2026-04-06",
"track": "10min_16mb",
"num_gpus": 8,
"gpu_type": "H100 SXM",
"training_time_seconds": 600,
"seed_results": {
"1337": {"val_loss": 1.95894579, "val_bpb": 1.10089760, "artifact_bytes": 15267156},
"42": {"val_loss": 1.91853397, "val_bpb": 1.07818677, "artifact_bytes": 15813568},
"2024": {"val_loss": 1.91764714, "val_bpb": 1.07768838, "artifact_bytes": 15807116}
},
"mean_val_bpb": 1.0856,
"std_val_bpb": 0.013,
"blurb": "Scylla tokenizer (998 vocab TokenMonster) + AR self-gen GPTQ int6 + BigramHash 3072x112 + VRL + XSA-11 + QK-Gain 4.0 + EMA/SWA + LZMA-9. No SLOT, no TTT. Legally clean."
Comment on lines +15 to +17
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submission.json blurb says “LZMA-9” and “No TTT”, but the checked-in script uses lzma preset=6 and defaults TTT_ENABLED/OGD_ENABLED to enabled. Please reconcile the submission metadata with the actual code path/settings used to produce the attached logs/artifacts.

Copilot uses AI. Check for mistakes.
}
Loading
Loading