Skip to content

Commit 2bd6100

Browse files
abaybektursunclaude
andcommitted
Asymmetric Two-Lane Parallel Routing + TAPIN_V6_CROSS_W=0.12 + MUON_WD=0.12 + Trimmed GPTQ + Wider Loop + Per-Pass Embeddings + Muon 0.98 + Legal TTT
Increases TAPIN_V6_CROSS_W from 0.06 to 0.12. This is the weight on the Tap-In V6 cross-window n-gram rule at eval time; doubling it pushes the cross-window hint harder. 3-seed mean V6+TTT BPB: s2025: 1.073313 (-0.000133 vs cross_w=0.06) s1234: 1.073801 (-0.000175) s42: 1.074701 (+0.000078 noise) Mean: 1.073938 (std 0.000704) 2 of 3 seeds improved meaningfully. Cumulative improvement over the original symmetric-init revision (1.075262): -0.001324 BPB. All 3 seeds under 16 MB (~102 KB headroom each). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 75700cb commit 2bd6100

File tree

15 files changed

+4032
-0
lines changed

15 files changed

+4032
-0
lines changed

records/track_10min_16mb/2026-04-10_WiderEmb_TapInV6_TTT/README.md

Lines changed: 323 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
=== V6 + TTT ===
2+
TTT: lr=0.005 epochs=3 chunk=32768 freeze=0
3+
ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633440 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
4+
ttt_sliding:params unfrozen=35946138 frozen=0
5+
ttt_chunk [1/1238] bpb=1.115386 time=5.0s
6+
ttt_chunk [11/1238] bpb=1.067653 time=10.8s
7+
ttt_chunk [21/1238] bpb=1.104281 time=14.0s
8+
ttt_chunk [31/1238] bpb=1.097180 time=17.2s
9+
ttt_chunk [41/1238] bpb=1.090453 time=20.5s
10+
ttt_chunk [51/1238] bpb=1.083067 time=23.7s
11+
ttt_chunk [61/1238] bpb=1.074231 time=26.9s
12+
ttt_chunk [71/1238] bpb=1.081660 time=30.1s
13+
ttt_chunk [81/1238] bpb=1.074904 time=33.4s
14+
ttt_chunk [91/1238] bpb=1.071278 time=36.6s
15+
ttt_chunk [101/1238] bpb=1.071204 time=39.9s
16+
ttt_chunk [111/1238] bpb=1.069492 time=43.1s
17+
ttt_chunk [121/1238] bpb=1.072499 time=46.4s
18+
ttt_chunk [131/1238] bpb=1.076268 time=49.6s
19+
ttt_chunk [141/1238] bpb=1.076875 time=52.8s
20+
ttt_chunk [151/1238] bpb=1.076507 time=56.0s
21+
ttt_chunk [161/1238] bpb=1.077110 time=59.3s
22+
ttt_chunk [171/1238] bpb=1.076601 time=62.5s
23+
ttt_chunk [181/1238] bpb=1.075082 time=66.1s
24+
ttt_chunk [191/1238] bpb=1.074949 time=69.3s
25+
ttt_chunk [201/1238] bpb=1.072464 time=72.5s
26+
ttt_chunk [211/1238] bpb=1.076882 time=75.8s
27+
ttt_chunk [221/1238] bpb=1.077244 time=79.0s
28+
ttt_chunk [231/1238] bpb=1.078920 time=82.2s
29+
ttt_chunk [241/1238] bpb=1.076974 time=85.5s
30+
ttt_chunk [251/1238] bpb=1.076990 time=88.7s
31+
ttt_chunk [261/1238] bpb=1.078079 time=91.9s
32+
ttt_chunk [271/1238] bpb=1.078492 time=95.2s
33+
ttt_chunk [281/1238] bpb=1.077734 time=98.4s
34+
ttt_chunk [291/1238] bpb=1.078958 time=101.7s
35+
ttt_chunk [301/1238] bpb=1.079222 time=104.9s
36+
ttt_chunk [311/1238] bpb=1.078035 time=108.1s
37+
ttt_chunk [321/1238] bpb=1.077952 time=111.4s
38+
ttt_chunk [331/1238] bpb=1.078261 time=114.6s
39+
ttt_chunk [341/1238] bpb=1.077384 time=117.9s
40+
ttt_chunk [351/1238] bpb=1.078068 time=121.1s
41+
ttt_chunk [361/1238] bpb=1.077007 time=124.3s
42+
ttt_chunk [371/1238] bpb=1.075394 time=127.5s
43+
ttt_chunk [381/1238] bpb=1.075789 time=130.8s
44+
ttt_chunk [391/1238] bpb=1.075546 time=134.0s
45+
ttt_chunk [401/1238] bpb=1.075646 time=137.2s
46+
ttt_chunk [411/1238] bpb=1.076277 time=140.4s
47+
ttt_chunk [421/1238] bpb=1.075780 time=143.7s
48+
ttt_chunk [431/1238] bpb=1.075971 time=146.9s
49+
ttt_chunk [441/1238] bpb=1.075999 time=150.1s
50+
ttt_chunk [451/1238] bpb=1.077185 time=153.3s
51+
ttt_chunk [461/1238] bpb=1.075413 time=156.6s
52+
ttt_chunk [471/1238] bpb=1.075432 time=159.8s
53+
ttt_chunk [481/1238] bpb=1.075512 time=163.0s
54+
ttt_chunk [491/1238] bpb=1.076021 time=166.3s
55+
ttt_chunk [501/1238] bpb=1.075674 time=169.5s
56+
ttt_chunk [511/1238] bpb=1.075295 time=172.8s
57+
ttt_chunk [521/1238] bpb=1.074808 time=176.0s
58+
ttt_chunk [531/1238] bpb=1.074796 time=179.2s
59+
ttt_chunk [541/1238] bpb=1.074898 time=182.5s
60+
ttt_chunk [551/1238] bpb=1.074424 time=185.7s
61+
ttt_chunk [561/1238] bpb=1.073750 time=188.9s
62+
ttt_chunk [571/1238] bpb=1.073214 time=192.1s
63+
ttt_chunk [581/1238] bpb=1.073581 time=195.4s
64+
ttt_chunk [591/1238] bpb=1.073820 time=198.7s
65+
ttt_chunk [601/1238] bpb=1.073723 time=201.9s
66+
ttt_chunk [611/1238] bpb=1.074226 time=205.1s
67+
ttt_chunk [621/1238] bpb=1.075091 time=208.3s
68+
ttt_chunk [631/1238] bpb=1.075173 time=211.6s
69+
ttt_chunk [641/1238] bpb=1.075635 time=214.8s
70+
ttt_chunk [651/1238] bpb=1.075998 time=218.0s
71+
ttt_chunk [661/1238] bpb=1.075339 time=221.3s
72+
ttt_chunk [671/1238] bpb=1.075079 time=224.5s
73+
ttt_chunk [681/1238] bpb=1.076306 time=227.8s
74+
ttt_chunk [691/1238] bpb=1.076498 time=231.0s
75+
ttt_chunk [701/1238] bpb=1.076318 time=234.2s
76+
ttt_chunk [711/1238] bpb=1.077012 time=237.4s
77+
ttt_chunk [721/1238] bpb=1.077319 time=240.6s
78+
ttt_chunk [731/1238] bpb=1.076682 time=243.8s
79+
ttt_chunk [741/1238] bpb=1.076361 time=247.0s
80+
ttt_chunk [751/1238] bpb=1.075408 time=250.3s
81+
ttt_chunk [761/1238] bpb=1.074825 time=253.5s
82+
ttt_chunk [771/1238] bpb=1.073824 time=256.7s
83+
ttt_chunk [781/1238] bpb=1.073810 time=259.9s
84+
ttt_chunk [791/1238] bpb=1.074194 time=263.1s
85+
ttt_chunk [801/1238] bpb=1.074468 time=266.3s
86+
ttt_chunk [811/1238] bpb=1.074006 time=269.6s
87+
ttt_chunk [821/1238] bpb=1.072710 time=272.8s
88+
ttt_chunk [831/1238] bpb=1.072393 time=276.1s
89+
ttt_chunk [841/1238] bpb=1.071957 time=279.3s
90+
ttt_chunk [851/1238] bpb=1.071657 time=282.5s
91+
ttt_chunk [861/1238] bpb=1.071324 time=285.7s
92+
ttt_chunk [871/1238] bpb=1.071222 time=288.9s
93+
ttt_chunk [881/1238] bpb=1.070767 time=292.2s
94+
ttt_chunk [891/1238] bpb=1.070260 time=295.4s
95+
ttt_chunk [901/1238] bpb=1.070634 time=298.6s
96+
ttt_chunk [911/1238] bpb=1.070320 time=301.9s
97+
ttt_chunk [921/1238] bpb=1.070534 time=305.1s
98+
ttt_chunk [931/1238] bpb=1.071199 time=308.3s
99+
ttt_chunk [941/1238] bpb=1.071528 time=311.5s
100+
ttt_chunk [951/1238] bpb=1.071440 time=314.8s
101+
ttt_chunk [961/1238] bpb=1.072256 time=318.0s
102+
ttt_chunk [971/1238] bpb=1.072637 time=321.2s
103+
ttt_chunk [981/1238] bpb=1.073002 time=324.4s
104+
ttt_chunk [991/1238] bpb=1.072789 time=327.7s
105+
ttt_chunk [1001/1238] bpb=1.072732 time=330.9s
106+
ttt_chunk [1011/1238] bpb=1.073090 time=334.1s
107+
ttt_chunk [1021/1238] bpb=1.073808 time=337.3s
108+
ttt_chunk [1031/1238] bpb=1.074197 time=341.0s
109+
ttt_chunk [1041/1238] bpb=1.074684 time=344.2s
110+
ttt_chunk [1051/1238] bpb=1.074626 time=347.5s
111+
ttt_chunk [1061/1238] bpb=1.074609 time=350.7s
112+
ttt_chunk [1071/1238] bpb=1.074772 time=353.9s
113+
ttt_chunk [1081/1238] bpb=1.074669 time=357.1s
114+
ttt_chunk [1091/1238] bpb=1.074878 time=360.3s
115+
ttt_chunk [1101/1238] bpb=1.075424 time=363.6s
116+
ttt_chunk [1111/1238] bpb=1.075717 time=366.8s
117+
ttt_chunk [1121/1238] bpb=1.075863 time=370.0s
118+
ttt_chunk [1131/1238] bpb=1.075530 time=373.2s
119+
ttt_chunk [1141/1238] bpb=1.075202 time=376.4s
120+
ttt_chunk [1151/1238] bpb=1.075228 time=379.7s
121+
ttt_chunk [1161/1238] bpb=1.075370 time=382.9s
122+
ttt_chunk [1171/1238] bpb=1.075150 time=386.1s
123+
ttt_chunk [1181/1238] bpb=1.074684 time=389.3s
124+
ttt_chunk [1191/1238] bpb=1.074831 time=392.5s
125+
ttt_chunk [1201/1238] bpb=1.074890 time=396.3s
126+
ttt_chunk [1211/1238] bpb=1.074553 time=399.5s
127+
ttt_chunk [1221/1238] bpb=1.074096 time=402.7s
128+
ttt_chunk [1231/1238] bpb=1.073721 time=405.9s
129+
ttt_chunk [1238/1238] bpb=1.073715 time=410.1s
130+
ttt_sliding:done val_loss=2.773728 val_bpb=1.073801 elapsed=410.5s
131+
val_loss: 2.773728 val_bpb: 1.073801 time: 410.7s
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
=== V6 + TTT ===
2+
TTT: lr=0.005 epochs=3 chunk=32768 freeze=0
3+
ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633440 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
4+
ttt_sliding:params unfrozen=35946138 frozen=0
5+
ttt_chunk [1/1238] bpb=1.114186 time=5.0s
6+
ttt_chunk [11/1238] bpb=1.064973 time=10.6s
7+
ttt_chunk [21/1238] bpb=1.102304 time=13.8s
8+
ttt_chunk [31/1238] bpb=1.096712 time=17.0s
9+
ttt_chunk [41/1238] bpb=1.089766 time=20.2s
10+
ttt_chunk [51/1238] bpb=1.082314 time=23.4s
11+
ttt_chunk [61/1238] bpb=1.073629 time=26.6s
12+
ttt_chunk [71/1238] bpb=1.080710 time=29.8s
13+
ttt_chunk [81/1238] bpb=1.073928 time=33.0s
14+
ttt_chunk [91/1238] bpb=1.070642 time=36.1s
15+
ttt_chunk [101/1238] bpb=1.070466 time=39.3s
16+
ttt_chunk [111/1238] bpb=1.068797 time=42.5s
17+
ttt_chunk [121/1238] bpb=1.071793 time=45.7s
18+
ttt_chunk [131/1238] bpb=1.075534 time=48.9s
19+
ttt_chunk [141/1238] bpb=1.076219 time=52.1s
20+
ttt_chunk [151/1238] bpb=1.076032 time=55.2s
21+
ttt_chunk [161/1238] bpb=1.076674 time=58.4s
22+
ttt_chunk [171/1238] bpb=1.076227 time=61.6s
23+
ttt_chunk [181/1238] bpb=1.074718 time=65.2s
24+
ttt_chunk [191/1238] bpb=1.074609 time=68.4s
25+
ttt_chunk [201/1238] bpb=1.072227 time=71.5s
26+
ttt_chunk [211/1238] bpb=1.076601 time=74.7s
27+
ttt_chunk [221/1238] bpb=1.076988 time=77.9s
28+
ttt_chunk [231/1238] bpb=1.078672 time=81.1s
29+
ttt_chunk [241/1238] bpb=1.076704 time=84.3s
30+
ttt_chunk [251/1238] bpb=1.076727 time=87.5s
31+
ttt_chunk [261/1238] bpb=1.077891 time=90.6s
32+
ttt_chunk [271/1238] bpb=1.078413 time=93.8s
33+
ttt_chunk [281/1238] bpb=1.077577 time=97.0s
34+
ttt_chunk [291/1238] bpb=1.078790 time=100.2s
35+
ttt_chunk [301/1238] bpb=1.078994 time=103.4s
36+
ttt_chunk [311/1238] bpb=1.077866 time=106.5s
37+
ttt_chunk [321/1238] bpb=1.077737 time=109.7s
38+
ttt_chunk [331/1238] bpb=1.078002 time=112.9s
39+
ttt_chunk [341/1238] bpb=1.077131 time=116.1s
40+
ttt_chunk [351/1238] bpb=1.077897 time=119.2s
41+
ttt_chunk [361/1238] bpb=1.076812 time=122.4s
42+
ttt_chunk [371/1238] bpb=1.075178 time=125.6s
43+
ttt_chunk [381/1238] bpb=1.075564 time=128.8s
44+
ttt_chunk [391/1238] bpb=1.075227 time=132.0s
45+
ttt_chunk [401/1238] bpb=1.075302 time=135.2s
46+
ttt_chunk [411/1238] bpb=1.075879 time=138.3s
47+
ttt_chunk [421/1238] bpb=1.075404 time=141.5s
48+
ttt_chunk [431/1238] bpb=1.075556 time=144.7s
49+
ttt_chunk [441/1238] bpb=1.075601 time=147.9s
50+
ttt_chunk [451/1238] bpb=1.076812 time=151.0s
51+
ttt_chunk [461/1238] bpb=1.075035 time=154.2s
52+
ttt_chunk [471/1238] bpb=1.075004 time=157.4s
53+
ttt_chunk [481/1238] bpb=1.075107 time=160.6s
54+
ttt_chunk [491/1238] bpb=1.075607 time=163.7s
55+
ttt_chunk [501/1238] bpb=1.075260 time=166.9s
56+
ttt_chunk [511/1238] bpb=1.074875 time=170.1s
57+
ttt_chunk [521/1238] bpb=1.074394 time=173.3s
58+
ttt_chunk [531/1238] bpb=1.074395 time=176.5s
59+
ttt_chunk [541/1238] bpb=1.074489 time=179.7s
60+
ttt_chunk [551/1238] bpb=1.074001 time=182.8s
61+
ttt_chunk [561/1238] bpb=1.073348 time=186.0s
62+
ttt_chunk [571/1238] bpb=1.072802 time=189.2s
63+
ttt_chunk [581/1238] bpb=1.073171 time=192.4s
64+
ttt_chunk [591/1238] bpb=1.073397 time=195.6s
65+
ttt_chunk [601/1238] bpb=1.073302 time=198.8s
66+
ttt_chunk [611/1238] bpb=1.073765 time=202.0s
67+
ttt_chunk [621/1238] bpb=1.074625 time=205.2s
68+
ttt_chunk [631/1238] bpb=1.074697 time=208.3s
69+
ttt_chunk [641/1238] bpb=1.075157 time=211.5s
70+
ttt_chunk [651/1238] bpb=1.075536 time=214.7s
71+
ttt_chunk [661/1238] bpb=1.074895 time=217.9s
72+
ttt_chunk [671/1238] bpb=1.074635 time=221.0s
73+
ttt_chunk [681/1238] bpb=1.075908 time=224.2s
74+
ttt_chunk [691/1238] bpb=1.076111 time=227.4s
75+
ttt_chunk [701/1238] bpb=1.075928 time=230.6s
76+
ttt_chunk [711/1238] bpb=1.076659 time=233.7s
77+
ttt_chunk [721/1238] bpb=1.076966 time=236.9s
78+
ttt_chunk [731/1238] bpb=1.076340 time=240.1s
79+
ttt_chunk [741/1238] bpb=1.076030 time=243.3s
80+
ttt_chunk [751/1238] bpb=1.075103 time=246.5s
81+
ttt_chunk [761/1238] bpb=1.074506 time=249.6s
82+
ttt_chunk [771/1238] bpb=1.073523 time=252.8s
83+
ttt_chunk [781/1238] bpb=1.073503 time=256.0s
84+
ttt_chunk [791/1238] bpb=1.073855 time=259.2s
85+
ttt_chunk [801/1238] bpb=1.074136 time=262.3s
86+
ttt_chunk [811/1238] bpb=1.073654 time=265.5s
87+
ttt_chunk [821/1238] bpb=1.072350 time=268.7s
88+
ttt_chunk [831/1238] bpb=1.072040 time=271.9s
89+
ttt_chunk [841/1238] bpb=1.071562 time=275.1s
90+
ttt_chunk [851/1238] bpb=1.071274 time=278.3s
91+
ttt_chunk [861/1238] bpb=1.070928 time=281.4s
92+
ttt_chunk [871/1238] bpb=1.070837 time=284.6s
93+
ttt_chunk [881/1238] bpb=1.070372 time=287.8s
94+
ttt_chunk [891/1238] bpb=1.069836 time=290.9s
95+
ttt_chunk [901/1238] bpb=1.070225 time=294.1s
96+
ttt_chunk [911/1238] bpb=1.069924 time=297.3s
97+
ttt_chunk [921/1238] bpb=1.070119 time=300.5s
98+
ttt_chunk [931/1238] bpb=1.070811 time=303.7s
99+
ttt_chunk [941/1238] bpb=1.071176 time=306.9s
100+
ttt_chunk [951/1238] bpb=1.071092 time=310.0s
101+
ttt_chunk [961/1238] bpb=1.071912 time=313.2s
102+
ttt_chunk [971/1238] bpb=1.072280 time=316.4s
103+
ttt_chunk [981/1238] bpb=1.072649 time=319.6s
104+
ttt_chunk [991/1238] bpb=1.072437 time=322.7s
105+
ttt_chunk [1001/1238] bpb=1.072347 time=325.9s
106+
ttt_chunk [1011/1238] bpb=1.072684 time=329.1s
107+
ttt_chunk [1021/1238] bpb=1.073381 time=332.3s
108+
ttt_chunk [1031/1238] bpb=1.073769 time=335.9s
109+
ttt_chunk [1041/1238] bpb=1.074238 time=339.1s
110+
ttt_chunk [1051/1238] bpb=1.074170 time=342.3s
111+
ttt_chunk [1061/1238] bpb=1.074144 time=345.5s
112+
ttt_chunk [1071/1238] bpb=1.074297 time=348.7s
113+
ttt_chunk [1081/1238] bpb=1.074180 time=352.3s
114+
ttt_chunk [1091/1238] bpb=1.074380 time=355.5s
115+
ttt_chunk [1101/1238] bpb=1.074927 time=358.7s
116+
ttt_chunk [1111/1238] bpb=1.075219 time=361.9s
117+
ttt_chunk [1121/1238] bpb=1.075354 time=365.1s
118+
ttt_chunk [1131/1238] bpb=1.075021 time=368.3s
119+
ttt_chunk [1141/1238] bpb=1.074688 time=371.5s
120+
ttt_chunk [1151/1238] bpb=1.074728 time=374.7s
121+
ttt_chunk [1161/1238] bpb=1.074862 time=377.9s
122+
ttt_chunk [1171/1238] bpb=1.074637 time=381.0s
123+
ttt_chunk [1181/1238] bpb=1.074167 time=384.2s
124+
ttt_chunk [1191/1238] bpb=1.074313 time=387.4s
125+
ttt_chunk [1201/1238] bpb=1.074374 time=390.7s
126+
ttt_chunk [1211/1238] bpb=1.074057 time=393.9s
127+
ttt_chunk [1221/1238] bpb=1.073593 time=397.1s
128+
ttt_chunk [1231/1238] bpb=1.073222 time=400.2s
129+
ttt_chunk [1238/1238] bpb=1.073210 time=404.3s
130+
ttt_sliding:done val_loss=2.772468 val_bpb=1.073313 elapsed=404.8s
131+
val_loss: 2.772468 val_bpb: 1.073313 time: 405.1s
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
=== V6 + TTT ===
2+
TTT: lr=0.005 epochs=3 chunk=32768 freeze=0
3+
ttt_sliding:start chunks=1238 chunk_tokens=32768 total_windows=633440 stride=64 ttt_lr=0.005 ttt_epochs=3 freeze_blocks=0
4+
ttt_sliding:params unfrozen=35946138 frozen=0
5+
ttt_chunk [1/1238] bpb=1.116585 time=4.6s
6+
ttt_chunk [11/1238] bpb=1.067518 time=10.2s
7+
ttt_chunk [21/1238] bpb=1.104229 time=13.4s
8+
ttt_chunk [31/1238] bpb=1.097890 time=16.6s
9+
ttt_chunk [41/1238] bpb=1.091082 time=19.8s
10+
ttt_chunk [51/1238] bpb=1.084090 time=23.0s
11+
ttt_chunk [61/1238] bpb=1.075198 time=26.2s
12+
ttt_chunk [71/1238] bpb=1.082459 time=29.4s
13+
ttt_chunk [81/1238] bpb=1.075977 time=32.6s
14+
ttt_chunk [91/1238] bpb=1.072530 time=35.8s
15+
ttt_chunk [101/1238] bpb=1.072174 time=39.0s
16+
ttt_chunk [111/1238] bpb=1.070342 time=42.5s
17+
ttt_chunk [121/1238] bpb=1.073529 time=45.7s
18+
ttt_chunk [131/1238] bpb=1.077313 time=48.9s
19+
ttt_chunk [141/1238] bpb=1.077898 time=52.1s
20+
ttt_chunk [151/1238] bpb=1.077755 time=55.3s
21+
ttt_chunk [161/1238] bpb=1.078290 time=58.5s
22+
ttt_chunk [171/1238] bpb=1.077799 time=61.7s
23+
ttt_chunk [181/1238] bpb=1.076278 time=64.9s
24+
ttt_chunk [191/1238] bpb=1.076087 time=68.1s
25+
ttt_chunk [201/1238] bpb=1.073689 time=71.2s
26+
ttt_chunk [211/1238] bpb=1.078085 time=74.4s
27+
ttt_chunk [221/1238] bpb=1.078471 time=77.6s
28+
ttt_chunk [231/1238] bpb=1.080047 time=80.8s
29+
ttt_chunk [241/1238] bpb=1.078184 time=84.1s
30+
ttt_chunk [251/1238] bpb=1.078241 time=87.2s
31+
ttt_chunk [261/1238] bpb=1.079250 time=90.5s
32+
ttt_chunk [271/1238] bpb=1.079621 time=93.7s
33+
ttt_chunk [281/1238] bpb=1.078849 time=96.9s
34+
ttt_chunk [291/1238] bpb=1.080048 time=100.1s
35+
ttt_chunk [301/1238] bpb=1.080373 time=103.3s
36+
ttt_chunk [311/1238] bpb=1.079291 time=106.4s
37+
ttt_chunk [321/1238] bpb=1.079146 time=109.6s
38+
ttt_chunk [331/1238] bpb=1.079442 time=112.8s
39+
ttt_chunk [341/1238] bpb=1.078556 time=116.0s
40+
ttt_chunk [351/1238] bpb=1.079270 time=119.2s
41+
ttt_chunk [361/1238] bpb=1.078211 time=122.4s
42+
ttt_chunk [371/1238] bpb=1.076612 time=125.6s
43+
ttt_chunk [381/1238] bpb=1.077005 time=128.8s
44+
ttt_chunk [391/1238] bpb=1.076733 time=131.9s
45+
ttt_chunk [401/1238] bpb=1.076827 time=135.1s
46+
ttt_chunk [411/1238] bpb=1.077364 time=138.3s
47+
ttt_chunk [421/1238] bpb=1.076884 time=141.5s
48+
ttt_chunk [431/1238] bpb=1.077060 time=144.7s
49+
ttt_chunk [441/1238] bpb=1.077152 time=147.9s
50+
ttt_chunk [451/1238] bpb=1.078318 time=151.1s
51+
ttt_chunk [461/1238] bpb=1.076572 time=154.3s
52+
ttt_chunk [471/1238] bpb=1.076559 time=157.5s
53+
ttt_chunk [481/1238] bpb=1.076595 time=160.7s
54+
ttt_chunk [491/1238] bpb=1.077081 time=163.9s
55+
ttt_chunk [501/1238] bpb=1.076718 time=167.1s
56+
ttt_chunk [511/1238] bpb=1.076294 time=170.3s
57+
ttt_chunk [521/1238] bpb=1.075767 time=173.4s
58+
ttt_chunk [531/1238] bpb=1.075752 time=176.6s
59+
ttt_chunk [541/1238] bpb=1.075841 time=179.8s
60+
ttt_chunk [551/1238] bpb=1.075377 time=183.1s
61+
ttt_chunk [561/1238] bpb=1.074672 time=186.2s
62+
ttt_chunk [571/1238] bpb=1.074110 time=189.4s
63+
ttt_chunk [581/1238] bpb=1.074488 time=192.6s
64+
ttt_chunk [591/1238] bpb=1.074716 time=195.8s
65+
ttt_chunk [601/1238] bpb=1.074644 time=199.0s
66+
ttt_chunk [611/1238] bpb=1.075133 time=202.2s
67+
ttt_chunk [621/1238] bpb=1.075999 time=205.4s
68+
ttt_chunk [631/1238] bpb=1.076071 time=208.6s
69+
ttt_chunk [641/1238] bpb=1.076503 time=211.8s
70+
ttt_chunk [651/1238] bpb=1.076862 time=215.0s
71+
ttt_chunk [661/1238] bpb=1.076216 time=218.2s
72+
ttt_chunk [671/1238] bpb=1.075961 time=221.4s
73+
ttt_chunk [681/1238] bpb=1.077220 time=224.6s
74+
ttt_chunk [691/1238] bpb=1.077408 time=227.8s
75+
ttt_chunk [701/1238] bpb=1.077228 time=231.0s
76+
ttt_chunk [711/1238] bpb=1.077922 time=234.2s
77+
ttt_chunk [721/1238] bpb=1.078237 time=237.4s
78+
ttt_chunk [731/1238] bpb=1.077582 time=240.6s
79+
ttt_chunk [741/1238] bpb=1.077272 time=243.8s
80+
ttt_chunk [751/1238] bpb=1.076338 time=247.0s
81+
ttt_chunk [761/1238] bpb=1.075762 time=250.1s
82+
ttt_chunk [771/1238] bpb=1.074753 time=253.3s
83+
ttt_chunk [781/1238] bpb=1.074757 time=256.5s
84+
ttt_chunk [791/1238] bpb=1.075094 time=259.7s
85+
ttt_chunk [801/1238] bpb=1.075360 time=262.9s
86+
ttt_chunk [811/1238] bpb=1.074878 time=266.1s
87+
ttt_chunk [821/1238] bpb=1.073574 time=269.3s
88+
ttt_chunk [831/1238] bpb=1.073259 time=272.5s
89+
ttt_chunk [841/1238] bpb=1.072814 time=275.7s
90+
ttt_chunk [851/1238] bpb=1.072541 time=278.9s
91+
ttt_chunk [861/1238] bpb=1.072197 time=282.1s
92+
ttt_chunk [871/1238] bpb=1.072104 time=285.3s
93+
ttt_chunk [881/1238] bpb=1.071666 time=288.5s
94+
ttt_chunk [891/1238] bpb=1.071125 time=291.7s
95+
ttt_chunk [901/1238] bpb=1.071507 time=294.9s
96+
ttt_chunk [911/1238] bpb=1.071204 time=298.1s
97+
ttt_chunk [921/1238] bpb=1.071435 time=301.2s
98+
ttt_chunk [931/1238] bpb=1.072102 time=304.4s
99+
ttt_chunk [941/1238] bpb=1.072472 time=307.6s
100+
ttt_chunk [951/1238] bpb=1.072390 time=310.8s
101+
ttt_chunk [961/1238] bpb=1.073213 time=314.0s
102+
ttt_chunk [971/1238] bpb=1.073595 time=317.2s
103+
ttt_chunk [981/1238] bpb=1.073963 time=320.4s
104+
ttt_chunk [991/1238] bpb=1.073749 time=323.6s
105+
ttt_chunk [1001/1238] bpb=1.073677 time=326.8s
106+
ttt_chunk [1011/1238] bpb=1.074031 time=330.0s
107+
ttt_chunk [1021/1238] bpb=1.074728 time=333.2s
108+
ttt_chunk [1031/1238] bpb=1.075130 time=336.8s
109+
ttt_chunk [1041/1238] bpb=1.075599 time=340.0s
110+
ttt_chunk [1051/1238] bpb=1.075539 time=343.2s
111+
ttt_chunk [1061/1238] bpb=1.075522 time=346.5s
112+
ttt_chunk [1071/1238] bpb=1.075669 time=349.7s
113+
ttt_chunk [1081/1238] bpb=1.075565 time=352.9s
114+
ttt_chunk [1091/1238] bpb=1.075769 time=356.0s
115+
ttt_chunk [1101/1238] bpb=1.076307 time=359.2s
116+
ttt_chunk [1111/1238] bpb=1.076581 time=362.4s
117+
ttt_chunk [1121/1238] bpb=1.076746 time=365.6s
118+
ttt_chunk [1131/1238] bpb=1.076427 time=368.8s
119+
ttt_chunk [1141/1238] bpb=1.076094 time=372.0s
120+
ttt_chunk [1151/1238] bpb=1.076121 time=375.2s
121+
ttt_chunk [1161/1238] bpb=1.076264 time=378.4s
122+
ttt_chunk [1171/1238] bpb=1.076057 time=381.6s
123+
ttt_chunk [1181/1238] bpb=1.075592 time=384.9s
124+
ttt_chunk [1191/1238] bpb=1.075751 time=388.1s
125+
ttt_chunk [1201/1238] bpb=1.075785 time=391.8s
126+
ttt_chunk [1211/1238] bpb=1.075477 time=395.0s
127+
ttt_chunk [1221/1238] bpb=1.075024 time=398.3s
128+
ttt_chunk [1231/1238] bpb=1.074638 time=401.5s
129+
ttt_chunk [1238/1238] bpb=1.074632 time=405.5s
130+
ttt_sliding:done val_loss=2.776053 val_bpb=1.074701 elapsed=406.0s
131+
val_loss: 2.776053 val_bpb: 1.074701 time: 406.2s

0 commit comments

Comments
 (0)