Skip to content

Commit 9150ab3

Browse files
committed
Wider Loop + Per-Pass Embeddings + Two-Lane Parallel Routing + Muon 0.98 + WD 0.10 + Tap-In V6 + Legal TTT
Adds two-lane parallel residual routing (layers 8-10) on top of the prior stack. Routing matrix (parallel_post_lambdas [11,2,2] + parallel_resid_lambdas [11,2] = 66 scalars) creates DenseNet-style direct gradient paths from every decoder sublayer to the output lane, reducing gradient attenuation through depth. 3-seed mean V6+TTT BPB: s42: 1.076063 s1234: 1.074644 s2025: 1.075079 Mean: 1.075262 (std 0.000745) All 3 seeds under 16 MB (15.91-15.93 MB, 75-86 KB headroom each). Decomposition (single-seed, s42 baseline vs two-lane): Pre-quant: 1.08784 -> 1.08476 (-0.00308) Post-SW: 1.08296 -> 1.07900 (-0.00396) V6+TTT: 1.07738 -> 1.07606 (-0.00132) The win is overwhelmingly in training efficiency (pre-quant improves by 0.003), not in expanded TTT surface — each sublayer now receives cleaner gradients per step because the path to the loss has length one routing coefficient instead of the full residual chain.
1 parent 75700cb commit 9150ab3

File tree

15 files changed

+3991
-0
lines changed

15 files changed

+3991
-0
lines changed

records/track_10min_16mb/2026-04-10_WiderEmb_TapInV6_TTT/README.md

Lines changed: 297 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
=== V6 + TTT ===
2+
TTT: lr=0.005 epochs=3 chunk=32768 freeze=0
3+
ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633440 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
4+
ttt_sliding:params unfrozen=35946138 frozen=0
5+
ttt_chunk [1/1238] bpb=1.111107 time=4.5s
6+
ttt_chunk [11/1238] bpb=1.067237 time=10.1s
7+
ttt_chunk [21/1238] bpb=1.103646 time=13.2s
8+
ttt_chunk [31/1238] bpb=1.096691 time=16.3s
9+
ttt_chunk [41/1238] bpb=1.089636 time=19.5s
10+
ttt_chunk [51/1238] bpb=1.082625 time=22.6s
11+
ttt_chunk [61/1238] bpb=1.074105 time=26.8s
12+
ttt_chunk [71/1238] bpb=1.081575 time=29.9s
13+
ttt_chunk [81/1238] bpb=1.074916 time=33.1s
14+
ttt_chunk [91/1238] bpb=1.071606 time=36.2s
15+
ttt_chunk [101/1238] bpb=1.071452 time=39.3s
16+
ttt_chunk [111/1238] bpb=1.069689 time=42.8s
17+
ttt_chunk [121/1238] bpb=1.072658 time=45.9s
18+
ttt_chunk [131/1238] bpb=1.076435 time=49.0s
19+
ttt_chunk [141/1238] bpb=1.076935 time=52.2s
20+
ttt_chunk [151/1238] bpb=1.076752 time=55.3s
21+
ttt_chunk [161/1238] bpb=1.077311 time=58.4s
22+
ttt_chunk [171/1238] bpb=1.076928 time=61.5s
23+
ttt_chunk [181/1238] bpb=1.075446 time=64.6s
24+
ttt_chunk [191/1238] bpb=1.075272 time=67.8s
25+
ttt_chunk [201/1238] bpb=1.072894 time=70.9s
26+
ttt_chunk [211/1238] bpb=1.077427 time=74.0s
27+
ttt_chunk [221/1238] bpb=1.077858 time=77.1s
28+
ttt_chunk [231/1238] bpb=1.079498 time=80.2s
29+
ttt_chunk [241/1238] bpb=1.077554 time=83.4s
30+
ttt_chunk [251/1238] bpb=1.077528 time=86.5s
31+
ttt_chunk [261/1238] bpb=1.078635 time=89.7s
32+
ttt_chunk [271/1238] bpb=1.079074 time=92.8s
33+
ttt_chunk [281/1238] bpb=1.078400 time=95.9s
34+
ttt_chunk [291/1238] bpb=1.079682 time=99.0s
35+
ttt_chunk [301/1238] bpb=1.079964 time=102.1s
36+
ttt_chunk [311/1238] bpb=1.078835 time=105.3s
37+
ttt_chunk [321/1238] bpb=1.078660 time=108.4s
38+
ttt_chunk [331/1238] bpb=1.078972 time=111.5s
39+
ttt_chunk [341/1238] bpb=1.078071 time=114.6s
40+
ttt_chunk [351/1238] bpb=1.078793 time=117.7s
41+
ttt_chunk [361/1238] bpb=1.077703 time=120.9s
42+
ttt_chunk [371/1238] bpb=1.076064 time=124.0s
43+
ttt_chunk [381/1238] bpb=1.076428 time=127.1s
44+
ttt_chunk [391/1238] bpb=1.076121 time=130.3s
45+
ttt_chunk [401/1238] bpb=1.076243 time=133.4s
46+
ttt_chunk [411/1238] bpb=1.076814 time=136.5s
47+
ttt_chunk [421/1238] bpb=1.076288 time=139.6s
48+
ttt_chunk [431/1238] bpb=1.076480 time=142.8s
49+
ttt_chunk [441/1238] bpb=1.076594 time=145.9s
50+
ttt_chunk [451/1238] bpb=1.077759 time=149.0s
51+
ttt_chunk [461/1238] bpb=1.076010 time=152.1s
52+
ttt_chunk [471/1238] bpb=1.075970 time=155.3s
53+
ttt_chunk [481/1238] bpb=1.076057 time=158.4s
54+
ttt_chunk [491/1238] bpb=1.076535 time=161.5s
55+
ttt_chunk [501/1238] bpb=1.076204 time=164.6s
56+
ttt_chunk [511/1238] bpb=1.075838 time=167.8s
57+
ttt_chunk [521/1238] bpb=1.075332 time=170.9s
58+
ttt_chunk [531/1238] bpb=1.075325 time=174.0s
59+
ttt_chunk [541/1238] bpb=1.075403 time=177.1s
60+
ttt_chunk [551/1238] bpb=1.074975 time=180.3s
61+
ttt_chunk [561/1238] bpb=1.074284 time=183.4s
62+
ttt_chunk [571/1238] bpb=1.073742 time=186.5s
63+
ttt_chunk [581/1238] bpb=1.074088 time=189.7s
64+
ttt_chunk [591/1238] bpb=1.074321 time=192.8s
65+
ttt_chunk [601/1238] bpb=1.074242 time=195.9s
66+
ttt_chunk [611/1238] bpb=1.074755 time=199.1s
67+
ttt_chunk [621/1238] bpb=1.075623 time=202.2s
68+
ttt_chunk [631/1238] bpb=1.075671 time=205.3s
69+
ttt_chunk [641/1238] bpb=1.076134 time=208.4s
70+
ttt_chunk [651/1238] bpb=1.076489 time=211.6s
71+
ttt_chunk [661/1238] bpb=1.075841 time=214.7s
72+
ttt_chunk [671/1238] bpb=1.075601 time=217.8s
73+
ttt_chunk [681/1238] bpb=1.076861 time=220.9s
74+
ttt_chunk [691/1238] bpb=1.077078 time=224.0s
75+
ttt_chunk [701/1238] bpb=1.076912 time=227.2s
76+
ttt_chunk [711/1238] bpb=1.077621 time=230.3s
77+
ttt_chunk [721/1238] bpb=1.077931 time=233.4s
78+
ttt_chunk [731/1238] bpb=1.077301 time=236.5s
79+
ttt_chunk [741/1238] bpb=1.076983 time=239.7s
80+
ttt_chunk [751/1238] bpb=1.076048 time=242.8s
81+
ttt_chunk [761/1238] bpb=1.075481 time=245.9s
82+
ttt_chunk [771/1238] bpb=1.074478 time=249.0s
83+
ttt_chunk [781/1238] bpb=1.074493 time=252.1s
84+
ttt_chunk [791/1238] bpb=1.074842 time=255.3s
85+
ttt_chunk [801/1238] bpb=1.075139 time=258.4s
86+
ttt_chunk [811/1238] bpb=1.074675 time=261.5s
87+
ttt_chunk [821/1238] bpb=1.073388 time=264.7s
88+
ttt_chunk [831/1238] bpb=1.073080 time=267.8s
89+
ttt_chunk [841/1238] bpb=1.072615 time=270.9s
90+
ttt_chunk [851/1238] bpb=1.072356 time=274.0s
91+
ttt_chunk [861/1238] bpb=1.072034 time=277.2s
92+
ttt_chunk [871/1238] bpb=1.071949 time=280.3s
93+
ttt_chunk [881/1238] bpb=1.071506 time=283.4s
94+
ttt_chunk [891/1238] bpb=1.070992 time=286.5s
95+
ttt_chunk [901/1238] bpb=1.071379 time=289.6s
96+
ttt_chunk [911/1238] bpb=1.071072 time=292.8s
97+
ttt_chunk [921/1238] bpb=1.071281 time=295.9s
98+
ttt_chunk [931/1238] bpb=1.071973 time=299.0s
99+
ttt_chunk [941/1238] bpb=1.072337 time=302.2s
100+
ttt_chunk [951/1238] bpb=1.072273 time=305.3s
101+
ttt_chunk [961/1238] bpb=1.073088 time=308.4s
102+
ttt_chunk [971/1238] bpb=1.073465 time=311.6s
103+
ttt_chunk [981/1238] bpb=1.073852 time=314.7s
104+
ttt_chunk [991/1238] bpb=1.073659 time=317.8s
105+
ttt_chunk [1001/1238] bpb=1.073600 time=320.9s
106+
ttt_chunk [1011/1238] bpb=1.073964 time=324.1s
107+
ttt_chunk [1021/1238] bpb=1.074671 time=327.2s
108+
ttt_chunk [1031/1238] bpb=1.075082 time=330.8s
109+
ttt_chunk [1041/1238] bpb=1.075564 time=333.9s
110+
ttt_chunk [1051/1238] bpb=1.075494 time=337.0s
111+
ttt_chunk [1061/1238] bpb=1.075460 time=340.2s
112+
ttt_chunk [1071/1238] bpb=1.075621 time=343.3s
113+
ttt_chunk [1081/1238] bpb=1.075503 time=346.4s
114+
ttt_chunk [1091/1238] bpb=1.075708 time=349.5s
115+
ttt_chunk [1101/1238] bpb=1.076245 time=352.7s
116+
ttt_chunk [1111/1238] bpb=1.076534 time=355.8s
117+
ttt_chunk [1121/1238] bpb=1.076681 time=358.9s
118+
ttt_chunk [1131/1238] bpb=1.076350 time=362.0s
119+
ttt_chunk [1141/1238] bpb=1.076006 time=365.1s
120+
ttt_chunk [1151/1238] bpb=1.076041 time=368.2s
121+
ttt_chunk [1161/1238] bpb=1.076173 time=371.4s
122+
ttt_chunk [1171/1238] bpb=1.075957 time=374.5s
123+
ttt_chunk [1181/1238] bpb=1.075474 time=377.6s
124+
ttt_chunk [1191/1238] bpb=1.075620 time=380.7s
125+
ttt_chunk [1201/1238] bpb=1.075624 time=384.4s
126+
ttt_chunk [1211/1238] bpb=1.075296 time=387.5s
127+
ttt_chunk [1221/1238] bpb=1.074833 time=390.6s
128+
ttt_chunk [1231/1238] bpb=1.074472 time=394.1s
129+
ttt_chunk [1238/1238] bpb=1.074465 time=398.1s
130+
ttt_sliding:done val_loss=2.775907 val_bpb=1.074644 elapsed=398.6s
131+
val_loss: 2.775907 val_bpb: 1.074644 time: 398.8s
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
=== V6 + TTT ===
2+
TTT: lr=0.005 epochs=3 chunk=32768 freeze=0
3+
ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633440 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
4+
ttt_sliding:params unfrozen=35946138 frozen=0
5+
ttt_chunk [1/1238] bpb=1.110496 time=5.0s
6+
ttt_chunk [11/1238] bpb=1.066305 time=10.5s
7+
ttt_chunk [21/1238] bpb=1.104060 time=13.7s
8+
ttt_chunk [31/1238] bpb=1.098179 time=16.8s
9+
ttt_chunk [41/1238] bpb=1.091312 time=19.9s
10+
ttt_chunk [51/1238] bpb=1.084588 time=23.1s
11+
ttt_chunk [61/1238] bpb=1.075571 time=26.2s
12+
ttt_chunk [71/1238] bpb=1.082695 time=29.3s
13+
ttt_chunk [81/1238] bpb=1.075791 time=32.4s
14+
ttt_chunk [91/1238] bpb=1.072276 time=35.6s
15+
ttt_chunk [101/1238] bpb=1.071959 time=38.7s
16+
ttt_chunk [111/1238] bpb=1.070231 time=41.9s
17+
ttt_chunk [121/1238] bpb=1.073423 time=45.0s
18+
ttt_chunk [131/1238] bpb=1.077171 time=48.1s
19+
ttt_chunk [141/1238] bpb=1.077692 time=51.2s
20+
ttt_chunk [151/1238] bpb=1.077553 time=54.3s
21+
ttt_chunk [161/1238] bpb=1.078110 time=57.5s
22+
ttt_chunk [171/1238] bpb=1.077707 time=60.6s
23+
ttt_chunk [181/1238] bpb=1.076281 time=64.1s
24+
ttt_chunk [191/1238] bpb=1.076098 time=67.3s
25+
ttt_chunk [201/1238] bpb=1.073791 time=70.4s
26+
ttt_chunk [211/1238] bpb=1.078210 time=73.5s
27+
ttt_chunk [221/1238] bpb=1.078603 time=76.6s
28+
ttt_chunk [231/1238] bpb=1.080171 time=79.8s
29+
ttt_chunk [241/1238] bpb=1.078273 time=82.9s
30+
ttt_chunk [251/1238] bpb=1.078303 time=86.0s
31+
ttt_chunk [261/1238] bpb=1.079390 time=89.2s
32+
ttt_chunk [271/1238] bpb=1.079802 time=92.3s
33+
ttt_chunk [281/1238] bpb=1.079046 time=95.4s
34+
ttt_chunk [291/1238] bpb=1.080242 time=98.5s
35+
ttt_chunk [301/1238] bpb=1.080466 time=101.7s
36+
ttt_chunk [311/1238] bpb=1.079378 time=104.8s
37+
ttt_chunk [321/1238] bpb=1.079289 time=107.9s
38+
ttt_chunk [331/1238] bpb=1.079579 time=111.0s
39+
ttt_chunk [341/1238] bpb=1.078726 time=114.2s
40+
ttt_chunk [351/1238] bpb=1.079458 time=117.3s
41+
ttt_chunk [361/1238] bpb=1.078327 time=120.4s
42+
ttt_chunk [371/1238] bpb=1.076691 time=123.5s
43+
ttt_chunk [381/1238] bpb=1.077060 time=126.6s
44+
ttt_chunk [391/1238] bpb=1.076785 time=129.8s
45+
ttt_chunk [401/1238] bpb=1.076891 time=132.9s
46+
ttt_chunk [411/1238] bpb=1.077452 time=136.0s
47+
ttt_chunk [421/1238] bpb=1.076936 time=139.1s
48+
ttt_chunk [431/1238] bpb=1.077102 time=142.3s
49+
ttt_chunk [441/1238] bpb=1.077175 time=145.4s
50+
ttt_chunk [451/1238] bpb=1.078350 time=148.5s
51+
ttt_chunk [461/1238] bpb=1.076585 time=151.6s
52+
ttt_chunk [471/1238] bpb=1.076577 time=154.8s
53+
ttt_chunk [481/1238] bpb=1.076694 time=157.9s
54+
ttt_chunk [491/1238] bpb=1.077172 time=161.0s
55+
ttt_chunk [501/1238] bpb=1.076835 time=164.1s
56+
ttt_chunk [511/1238] bpb=1.076482 time=167.3s
57+
ttt_chunk [521/1238] bpb=1.075989 time=170.4s
58+
ttt_chunk [531/1238] bpb=1.075976 time=173.5s
59+
ttt_chunk [541/1238] bpb=1.076060 time=176.6s
60+
ttt_chunk [551/1238] bpb=1.075608 time=179.7s
61+
ttt_chunk [561/1238] bpb=1.074937 time=182.9s
62+
ttt_chunk [571/1238] bpb=1.074383 time=186.0s
63+
ttt_chunk [581/1238] bpb=1.074752 time=189.1s
64+
ttt_chunk [591/1238] bpb=1.074966 time=192.2s
65+
ttt_chunk [601/1238] bpb=1.074883 time=195.4s
66+
ttt_chunk [611/1238] bpb=1.075406 time=198.5s
67+
ttt_chunk [621/1238] bpb=1.076296 time=201.6s
68+
ttt_chunk [631/1238] bpb=1.076400 time=204.8s
69+
ttt_chunk [641/1238] bpb=1.076868 time=207.9s
70+
ttt_chunk [651/1238] bpb=1.077238 time=211.0s
71+
ttt_chunk [661/1238] bpb=1.076583 time=214.1s
72+
ttt_chunk [671/1238] bpb=1.076323 time=217.3s
73+
ttt_chunk [681/1238] bpb=1.077597 time=220.4s
74+
ttt_chunk [691/1238] bpb=1.077776 time=223.5s
75+
ttt_chunk [701/1238] bpb=1.077611 time=226.6s
76+
ttt_chunk [711/1238] bpb=1.078349 time=229.8s
77+
ttt_chunk [721/1238] bpb=1.078663 time=232.9s
78+
ttt_chunk [731/1238] bpb=1.078016 time=236.0s
79+
ttt_chunk [741/1238] bpb=1.077720 time=239.1s
80+
ttt_chunk [751/1238] bpb=1.076786 time=242.2s
81+
ttt_chunk [761/1238] bpb=1.076195 time=245.4s
82+
ttt_chunk [771/1238] bpb=1.075202 time=248.5s
83+
ttt_chunk [781/1238] bpb=1.075204 time=251.6s
84+
ttt_chunk [791/1238] bpb=1.075530 time=254.7s
85+
ttt_chunk [801/1238] bpb=1.075803 time=257.8s
86+
ttt_chunk [811/1238] bpb=1.075328 time=261.0s
87+
ttt_chunk [821/1238] bpb=1.074051 time=264.1s
88+
ttt_chunk [831/1238] bpb=1.073734 time=267.2s
89+
ttt_chunk [841/1238] bpb=1.073256 time=270.4s
90+
ttt_chunk [851/1238] bpb=1.072970 time=273.5s
91+
ttt_chunk [861/1238] bpb=1.072665 time=276.6s
92+
ttt_chunk [871/1238] bpb=1.072563 time=279.7s
93+
ttt_chunk [881/1238] bpb=1.072103 time=282.8s
94+
ttt_chunk [891/1238] bpb=1.071549 time=286.0s
95+
ttt_chunk [901/1238] bpb=1.071931 time=289.1s
96+
ttt_chunk [911/1238] bpb=1.071602 time=292.2s
97+
ttt_chunk [921/1238] bpb=1.071840 time=295.4s
98+
ttt_chunk [931/1238] bpb=1.072535 time=298.5s
99+
ttt_chunk [941/1238] bpb=1.072903 time=301.6s
100+
ttt_chunk [951/1238] bpb=1.072814 time=304.8s
101+
ttt_chunk [961/1238] bpb=1.073629 time=307.9s
102+
ttt_chunk [971/1238] bpb=1.074004 time=311.0s
103+
ttt_chunk [981/1238] bpb=1.074368 time=314.1s
104+
ttt_chunk [991/1238] bpb=1.074178 time=317.2s
105+
ttt_chunk [1001/1238] bpb=1.074119 time=320.4s
106+
ttt_chunk [1011/1238] bpb=1.074468 time=323.5s
107+
ttt_chunk [1021/1238] bpb=1.075171 time=326.6s
108+
ttt_chunk [1031/1238] bpb=1.075589 time=330.3s
109+
ttt_chunk [1041/1238] bpb=1.076074 time=333.4s
110+
ttt_chunk [1051/1238] bpb=1.076012 time=336.6s
111+
ttt_chunk [1061/1238] bpb=1.075986 time=339.7s
112+
ttt_chunk [1071/1238] bpb=1.076147 time=342.8s
113+
ttt_chunk [1081/1238] bpb=1.076038 time=346.4s
114+
ttt_chunk [1091/1238] bpb=1.076243 time=349.6s
115+
ttt_chunk [1101/1238] bpb=1.076787 time=352.7s
116+
ttt_chunk [1111/1238] bpb=1.077076 time=355.8s
117+
ttt_chunk [1121/1238] bpb=1.077245 time=358.9s
118+
ttt_chunk [1131/1238] bpb=1.076893 time=362.1s
119+
ttt_chunk [1141/1238] bpb=1.076574 time=365.2s
120+
ttt_chunk [1151/1238] bpb=1.076610 time=368.3s
121+
ttt_chunk [1161/1238] bpb=1.076733 time=371.5s
122+
ttt_chunk [1171/1238] bpb=1.076512 time=374.6s
123+
ttt_chunk [1181/1238] bpb=1.076044 time=377.7s
124+
ttt_chunk [1191/1238] bpb=1.076194 time=380.8s
125+
ttt_chunk [1201/1238] bpb=1.076221 time=384.0s
126+
ttt_chunk [1211/1238] bpb=1.075889 time=387.1s
127+
ttt_chunk [1221/1238] bpb=1.075422 time=390.2s
128+
ttt_chunk [1231/1238] bpb=1.075049 time=393.4s
129+
ttt_chunk [1238/1238] bpb=1.075033 time=397.4s
130+
ttt_sliding:done val_loss=2.777030 val_bpb=1.075079 elapsed=397.9s
131+
val_loss: 2.777030 val_bpb: 1.075079 time: 398.1s
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
=== V6 + TTT ===
2+
TTT: lr=0.005 epochs=3 chunk=32768 freeze=0
3+
ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633440 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
4+
ttt_sliding:params unfrozen=35946138 frozen=0
5+
ttt_chunk [1/1238] bpb=1.112451 time=5.0s
6+
ttt_chunk [11/1238] bpb=1.066458 time=10.6s
7+
ttt_chunk [21/1238] bpb=1.103336 time=13.8s
8+
ttt_chunk [31/1238] bpb=1.097635 time=17.0s
9+
ttt_chunk [41/1238] bpb=1.091063 time=20.1s
10+
ttt_chunk [51/1238] bpb=1.084342 time=23.3s
11+
ttt_chunk [61/1238] bpb=1.075805 time=26.5s
12+
ttt_chunk [71/1238] bpb=1.083393 time=29.7s
13+
ttt_chunk [81/1238] bpb=1.076584 time=32.8s
14+
ttt_chunk [91/1238] bpb=1.073270 time=36.0s
15+
ttt_chunk [101/1238] bpb=1.072887 time=39.2s
16+
ttt_chunk [111/1238] bpb=1.071220 time=42.4s
17+
ttt_chunk [121/1238] bpb=1.074359 time=45.5s
18+
ttt_chunk [131/1238] bpb=1.078133 time=48.7s
19+
ttt_chunk [141/1238] bpb=1.078796 time=51.9s
20+
ttt_chunk [151/1238] bpb=1.078541 time=55.0s
21+
ttt_chunk [161/1238] bpb=1.079145 time=58.2s
22+
ttt_chunk [171/1238] bpb=1.078795 time=61.4s
23+
ttt_chunk [181/1238] bpb=1.077228 time=65.0s
24+
ttt_chunk [191/1238] bpb=1.077150 time=68.1s
25+
ttt_chunk [201/1238] bpb=1.074744 time=71.3s
26+
ttt_chunk [211/1238] bpb=1.079204 time=74.5s
27+
ttt_chunk [221/1238] bpb=1.079574 time=77.7s
28+
ttt_chunk [231/1238] bpb=1.081130 time=80.8s
29+
ttt_chunk [241/1238] bpb=1.079261 time=84.1s
30+
ttt_chunk [251/1238] bpb=1.079315 time=87.2s
31+
ttt_chunk [261/1238] bpb=1.080369 time=90.4s
32+
ttt_chunk [271/1238] bpb=1.080864 time=93.5s
33+
ttt_chunk [281/1238] bpb=1.080088 time=96.7s
34+
ttt_chunk [291/1238] bpb=1.081245 time=99.8s
35+
ttt_chunk [301/1238] bpb=1.081473 time=103.0s
36+
ttt_chunk [311/1238] bpb=1.080355 time=106.2s
37+
ttt_chunk [321/1238] bpb=1.080287 time=109.3s
38+
ttt_chunk [331/1238] bpb=1.080520 time=112.5s
39+
ttt_chunk [341/1238] bpb=1.079624 time=115.6s
40+
ttt_chunk [351/1238] bpb=1.080370 time=118.8s
41+
ttt_chunk [361/1238] bpb=1.079265 time=121.9s
42+
ttt_chunk [371/1238] bpb=1.077643 time=125.1s
43+
ttt_chunk [381/1238] bpb=1.078034 time=128.2s
44+
ttt_chunk [391/1238] bpb=1.077767 time=131.4s
45+
ttt_chunk [401/1238] bpb=1.077876 time=134.6s
46+
ttt_chunk [411/1238] bpb=1.078446 time=137.7s
47+
ttt_chunk [421/1238] bpb=1.077961 time=140.9s
48+
ttt_chunk [431/1238] bpb=1.078125 time=144.0s
49+
ttt_chunk [441/1238] bpb=1.078181 time=147.2s
50+
ttt_chunk [451/1238] bpb=1.079375 time=150.4s
51+
ttt_chunk [461/1238] bpb=1.077633 time=153.5s
52+
ttt_chunk [471/1238] bpb=1.077623 time=156.7s
53+
ttt_chunk [481/1238] bpb=1.077721 time=159.8s
54+
ttt_chunk [491/1238] bpb=1.078183 time=163.0s
55+
ttt_chunk [501/1238] bpb=1.077822 time=166.2s
56+
ttt_chunk [511/1238] bpb=1.077443 time=169.3s
57+
ttt_chunk [521/1238] bpb=1.076955 time=172.5s
58+
ttt_chunk [531/1238] bpb=1.076926 time=175.6s
59+
ttt_chunk [541/1238] bpb=1.076995 time=178.8s
60+
ttt_chunk [551/1238] bpb=1.076528 time=181.9s
61+
ttt_chunk [561/1238] bpb=1.075880 time=185.1s
62+
ttt_chunk [571/1238] bpb=1.075348 time=188.3s
63+
ttt_chunk [581/1238] bpb=1.075690 time=191.4s
64+
ttt_chunk [591/1238] bpb=1.075901 time=194.6s
65+
ttt_chunk [601/1238] bpb=1.075821 time=197.7s
66+
ttt_chunk [611/1238] bpb=1.076344 time=200.9s
67+
ttt_chunk [621/1238] bpb=1.077198 time=204.1s
68+
ttt_chunk [631/1238] bpb=1.077273 time=207.3s
69+
ttt_chunk [641/1238] bpb=1.077706 time=210.4s
70+
ttt_chunk [651/1238] bpb=1.078072 time=213.6s
71+
ttt_chunk [661/1238] bpb=1.077418 time=216.7s
72+
ttt_chunk [671/1238] bpb=1.077169 time=219.9s
73+
ttt_chunk [681/1238] bpb=1.078425 time=223.1s
74+
ttt_chunk [691/1238] bpb=1.078633 time=226.2s
75+
ttt_chunk [701/1238] bpb=1.078457 time=229.4s
76+
ttt_chunk [711/1238] bpb=1.079194 time=232.5s
77+
ttt_chunk [721/1238] bpb=1.079498 time=235.7s
78+
ttt_chunk [731/1238] bpb=1.078866 time=238.9s
79+
ttt_chunk [741/1238] bpb=1.078563 time=242.0s
80+
ttt_chunk [751/1238] bpb=1.077657 time=245.2s
81+
ttt_chunk [761/1238] bpb=1.077062 time=248.3s
82+
ttt_chunk [771/1238] bpb=1.076057 time=251.5s
83+
ttt_chunk [781/1238] bpb=1.076042 time=254.6s
84+
ttt_chunk [791/1238] bpb=1.076407 time=257.8s
85+
ttt_chunk [801/1238] bpb=1.076677 time=260.9s
86+
ttt_chunk [811/1238] bpb=1.076197 time=264.1s
87+
ttt_chunk [821/1238] bpb=1.074874 time=267.3s
88+
ttt_chunk [831/1238] bpb=1.074571 time=270.4s
89+
ttt_chunk [841/1238] bpb=1.074116 time=273.6s
90+
ttt_chunk [851/1238] bpb=1.073852 time=276.7s
91+
ttt_chunk [861/1238] bpb=1.073515 time=279.9s
92+
ttt_chunk [871/1238] bpb=1.073430 time=283.1s
93+
ttt_chunk [881/1238] bpb=1.072987 time=286.2s
94+
ttt_chunk [891/1238] bpb=1.072453 time=289.4s
95+
ttt_chunk [901/1238] bpb=1.072828 time=292.5s
96+
ttt_chunk [911/1238] bpb=1.072524 time=295.7s
97+
ttt_chunk [921/1238] bpb=1.072763 time=298.8s
98+
ttt_chunk [931/1238] bpb=1.073446 time=302.0s
99+
ttt_chunk [941/1238] bpb=1.073816 time=305.2s
100+
ttt_chunk [951/1238] bpb=1.073709 time=308.3s
101+
ttt_chunk [961/1238] bpb=1.074505 time=311.5s
102+
ttt_chunk [971/1238] bpb=1.074898 time=314.6s
103+
ttt_chunk [981/1238] bpb=1.075273 time=317.8s
104+
ttt_chunk [991/1238] bpb=1.075068 time=320.9s
105+
ttt_chunk [1001/1238] bpb=1.075015 time=324.1s
106+
ttt_chunk [1011/1238] bpb=1.075382 time=327.3s
107+
ttt_chunk [1021/1238] bpb=1.076093 time=330.4s
108+
ttt_chunk [1031/1238] bpb=1.076514 time=334.0s
109+
ttt_chunk [1041/1238] bpb=1.076994 time=337.2s
110+
ttt_chunk [1051/1238] bpb=1.076941 time=340.4s
111+
ttt_chunk [1061/1238] bpb=1.076924 time=343.6s
112+
ttt_chunk [1071/1238] bpb=1.077084 time=346.7s
113+
ttt_chunk [1081/1238] bpb=1.076972 time=349.9s
114+
ttt_chunk [1091/1238] bpb=1.077183 time=353.1s
115+
ttt_chunk [1101/1238] bpb=1.077723 time=356.2s
116+
ttt_chunk [1111/1238] bpb=1.077993 time=359.4s
117+
ttt_chunk [1121/1238] bpb=1.078152 time=362.6s
118+
ttt_chunk [1131/1238] bpb=1.077808 time=365.7s
119+
ttt_chunk [1141/1238] bpb=1.077485 time=368.9s
120+
ttt_chunk [1151/1238] bpb=1.077534 time=372.1s
121+
ttt_chunk [1161/1238] bpb=1.077661 time=375.2s
122+
ttt_chunk [1171/1238] bpb=1.077443 time=378.4s
123+
ttt_chunk [1181/1238] bpb=1.076976 time=381.6s
124+
ttt_chunk [1191/1238] bpb=1.077128 time=384.7s
125+
ttt_chunk [1201/1238] bpb=1.077146 time=388.4s
126+
ttt_chunk [1211/1238] bpb=1.076824 time=391.5s
127+
ttt_chunk [1221/1238] bpb=1.076354 time=394.7s
128+
ttt_chunk [1231/1238] bpb=1.075987 time=397.9s
129+
ttt_chunk [1238/1238] bpb=1.075982 time=401.8s
130+
ttt_sliding:done val_loss=2.779570 val_bpb=1.076063 elapsed=402.4s
131+
val_loss: 2.779570 val_bpb: 1.076063 time: 402.6s

0 commit comments

Comments
 (0)