-
Notifications
You must be signed in to change notification settings - Fork 3k
Record: Scylla + GPTQ + BH3072 — val_bpb 1.0856 (3-seed mean) #1405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| # Scylla + GPTQ + BH3072 — val_bpb 1.0856 (3-seed mean) | ||
|
|
||
| **val_bpb = 1.0856** (3-seed mean) | 15.3-15.8 MB | 8xH100 SXM | No SLOT, No TTT | ||
|
|
||
| ## 3-Seed Results | ||
|
|
||
| | Seed | Sliding BPB | Artifact | | ||
| |------|------------|----------| | ||
| | 1337 | 1.1009 | 15,267,156 | | ||
| | 42 | **1.0782** | 15,813,568 | | ||
| | 2024 | **1.0777** | 15,807,116 | | ||
| | **Mean** | **1.0856** | | | ||
|
|
||
| Beats merged SOTA (1.1147, PR #1019) by 0.029 BPB (14x significance threshold). | ||
|
|
||
| ## Key Techniques | ||
|
|
||
| - **Scylla tokenizer** (998-vocab TokenMonster, PR #1143 @simon-marcus): 37% fewer tokens per byte vs SentencePiece 1024 | ||
| - **AR self-gen Full Hessian GPTQ** (int6, Cholesky error compensation): 64 self-generated sequences for calibration | ||
| - **BigramHash 3072x112**: matching #1019's configuration | ||
| - **Architecture**: 11L 512d 8H/4KV GQA, LeakyReLU(0.5)^2 MLP 3x, VRL, VE128, XSA all 11 layers, QK-Gain 4.0, Partial RoPE 16/64, LN Scale, SmearGate, U-Net skips, EMA(0.997) + SWA, Late QAT, LZMA-9, FA3 | ||
|
|
||
| ## Compliance | ||
|
|
||
| - No SLOT (no eval-time delta optimization) | ||
| - No TTT (no eval-time weight updates) | ||
| - No n-gram cache | ||
| - No network calls | ||
| - Tokenizer byte accounting via validated metadata (candidate.meta.npz) | ||
| - All artifacts under 16MB, all training under 600s | ||
|
|
||
| ## Reproduction | ||
|
|
||
| ```bash | ||
| VOCAB_SIZE=998 BIGRAM_VOCAB_SIZE=3072 BIGRAM_DIM=112 WARMDOWN_ITERS=4000 \ | ||
| DATA_PATH=./data/datasets/fineweb10B_scylla \ | ||
| TOKENIZER_PATH=./candidate.vocab TOKENIZER_META_PATH=./candidate.meta.npz \ | ||
| SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py | ||
| ``` | ||
|
Comment on lines
+23
to
+39
|
||
|
|
||
| Requires Scylla-retokenized FineWeb shards (see anthonym21/fineweb10B-scylla on HuggingFace). | ||
|
|
||
| ## Credits | ||
|
|
||
| - Scylla tokenizer: @simon-marcus (PR #1143) | ||
| - Training stack lineage: PR #175 (@anthony-maio), PR #1019 (@abaybektursun) | ||
| - GPTQ: PR #1019 (@abaybektursun) | ||
| - VRL: ResFormer (arXiv:2410.17897) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| { | ||
| "name": "Scylla_GPTQ_BH3072", | ||
| "author": "Anthony Maio", | ||
| "github_id": "anthony-maio", | ||
| "date": "2026-04-06", | ||
| "track": "10min_16mb", | ||
| "num_gpus": 8, | ||
| "gpu_type": "H100 SXM", | ||
| "training_time_seconds": 600, | ||
| "seed_results": { | ||
| "1337": {"val_loss": 1.95894579, "val_bpb": 1.10089760, "artifact_bytes": 15267156}, | ||
| "42": {"val_loss": 1.91853397, "val_bpb": 1.07818677, "artifact_bytes": 15813568}, | ||
| "2024": {"val_loss": 1.91764714, "val_bpb": 1.07768838, "artifact_bytes": 15807116} | ||
| }, | ||
| "mean_val_bpb": 1.0856, | ||
| "std_val_bpb": 0.013, | ||
| "blurb": "Scylla tokenizer (998 vocab TokenMonster) + AR self-gen GPTQ int6 + BigramHash 3072x112 + VRL + XSA-11 + QK-Gain 4.0 + EMA/SWA + LZMA-9. No SLOT, no TTT. Legally clean." | ||
|
Comment on lines
+15
to
+17
|
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Markdown table in the “3-Seed Results” section uses double leading pipes (
||) on each row, which doesn’t render as a standard GitHub table. Replace with single|delimiters so the table formats correctly.