|
| 1 | +timestamp,run_id,val_bpb,cadence,cadence_offset,num_unique_layers,num_loops,lr,grad_clip,mlp_mult,model_dim,steps,f_steps,n_steps,avg_ms,time_s,params,reasoning,notes |
| 2 | +2026-03-23T00:09:42.654400,frug2_001,2.254615,3,0,6,2,0.002,5.0,4,0,,,,,,,seed,H100 winner: 6x2 mlp4 |
| 3 | +2026-03-23T00:13:34.069000,frug2_002,2.235808,3,0,6,2,0.002,5.0,3,0,,,,,,,seed,6x2 mlp3 (how much does mlp4 help?) |
| 4 | +2026-03-23T00:17:38.304284,frug2_003,2.205482,3,0,4,3,0.002,5.0,4,0,,,,,,,seed,"4x3 mlp4 (more loops, fewer layers)" |
| 5 | +2026-03-23T00:22:33.028628,frug2_004,2.32917,3,0,8,2,0.002,5.0,4,0,,,,,,,seed,"8x2 mlp4 (more unique, fast)" |
| 6 | +2026-03-23T00:28:34.288819,frug2_005,2.35167,1,0,6,2,0.002,5.0,4,0,,,,,,,seed,6x2 always fractal |
| 7 | +2026-03-23T00:33:10.586660,frug2_006,2.707633,2,0,6,2,0.002,5.0,4,0,,,,,,,seed,6x2 cadence 2 (F/N) |
| 8 | +2026-03-23T00:36:49.993983,frug2_007,2.255619,3,0,5,2,0.002,5.0,4,0,,,,,,,seed,"5x2 mlp4 (faster, more steps)" |
| 9 | +2026-03-23T00:39:57.175446,frug2_008,2.196184,1,0,6,1,0.002,5.0,4,0,,,,,,,seed,6x1 no loops (flat mlp4 control) |
| 10 | +2026-03-23T00:43:38.816871,frug2_009,2.24638,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, L=4x3 with cadence 3 and MLP 4x shows strong performance, while the current best (L=6x2 cad=3 mlp=4) is close behind. The optimal configuration likely balances depth and loop cou","Based on the results, L=4x3 with cadence 3 and MLP 4x shows strong performance, while the current be" |
| 11 | +2026-03-23T00:47:16.360352,frug2_010,2.24475,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, L=5x2 with cadence 3 and MLP 4x shows strong performance (2.2464 BPB), while L=6x2 with cadence 3 is slightly worse (2.2546 BPB). The cadence pattern F/N/N (cadence 3) seems bene","Based on the results, L=5x2 with cadence 3 and MLP 4x shows strong performance (2.2464 BPB), while L" |
| 12 | +2026-03-23T00:50:53.541726,frug2_011,2.228696,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, L=5x2 with cadence=3 and MLP=4 shows strong performance at 2.2447 BPB. The previous best was L=6x2 cad=3 mlp=4 at 2.2546 BPB, so we should explore variations around this winning ","Based on the results, L=5x2 with cadence=3 and MLP=4 shows strong performance at 2.2447 BPB. The pre" |
| 13 | +2026-03-23T00:54:54.538187,frug2_012,2.213174,3,0,4,3,0.002,5.0,4,0,,,,,,,"Based on the results, L=5x2 with cadence 3 and MLP 4x shows strong performance while being more efficient than higher loop counts. The cadence pattern F/N/N with MLP 4x appears optimal. We should test","Based on the results, L=5x2 with cadence 3 and MLP 4x shows strong performance while being more effi" |
| 14 | +2026-03-23T00:58:55.568576,frug2_013,2.210759,3,0,4,3,0.002,5.0,4,0,,,,,,,"Based on the results, L=4x3 with cadence 3 and MLP 4x shows strong performance, but we need to explore if increasing loop count with reduced layers per loop can improve efficiency. The cadence pattern","Based on the results, L=4x3 with cadence 3 and MLP 4x shows strong performance, but we need to explo" |
| 15 | +2026-03-23T01:02:32.929687,frug2_014,2.242233,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, L=4x3 with cadence 3 and MLP 4x shows strong performance, while the optimal configuration appears to be around 5x2 or 6x2 with MLP 4x. The cadence pattern F/N/N (cadence=3) seems","Based on the results, L=4x3 with cadence 3 and MLP 4x shows strong performance, while the optimal co" |
| 16 | +2026-03-23T01:06:10.278314,frug2_015,2.238606,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows strong performance around 2.24 BPB, and cadence 3 seems optimal. The 6x2 configuration with MLP 4x is close to SOTA at 1.1233 BPB, so we should test variati","Based on the results, 5x2 with MLP 4x shows strong performance around 2.24 BPB, and cadence 3 seems " |
| 17 | +2026-03-23T01:09:47.108121,frug2_016,2.244032,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, L=5x2 with cadence 3 and MLP 4x shows strong performance around 2.238 BPB. The best configurations seem to favor fewer unique layers with more loops for optimal efficiency. Testi","Based on the results, L=5x2 with cadence 3 and MLP 4x shows strong performance around 2.238 BPB. The" |
| 18 | +2026-03-23T01:13:24.252495,frug2_017,2.23282,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, L=5x2 with cadence 3 and MLP 4x shows strong performance close to the best. The cadence pattern F/N/N (cadence=3) seems effective, but we should test if changing cadence or tryin","Based on the results, L=5x2 with cadence 3 and MLP 4x shows strong performance close to the best. Th" |
| 19 | +2026-03-23T01:17:01.282282,frug2_018,2.231204,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows strong performance around 2.23 BPB, and cadence 3 seems optimal. The 8x2 configuration performs poorly, suggesting diminishing returns from increasing loops","Based on the results, 5x2 with MLP 4x shows strong performance around 2.23 BPB, and cadence 3 seems " |
| 20 | +2026-03-23T01:20:14.825450,frug2_019,2.197056,3,0,4,2,0.002,5.0,4,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows strong performance around 2.23-2.24 BPB, while 6x2 with MLP 4x is slightly worse at ~2.25 BPB. The cadence 3 performs better than 1 or 2. We should test if ","Based on the results, 5x2 with MLP 4x shows strong performance around 2.23-2.24 BPB, while 6x2 with " |
| 21 | +2026-03-23T01:23:52.959072,frug2_020,2.258728,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows strong performance around 2.23 BPB, while 6x2 with MLP 4x is slightly worse at ~2.25 BPB. The cadence pattern F/N/N (cadence=3) seems optimal based on previ","Based on the results, 5x2 with MLP 4x shows strong performance around 2.23 BPB, while 6x2 with MLP 4" |
| 22 | +2026-03-23T01:29:16.647297,frug2_021,2.273206,3,0,5,4,0.002,5.0,4,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows the best performance so far, but we need to explore if increasing loops to 6 while keeping 5 layers could improve further. The cadence pattern F/N/N (3) see","Based on the results, 5x2 with MLP 4x shows the best performance so far, but we need to explore if i" |
| 23 | +2026-03-23T01:33:51.777113,frug2_022,2.267194,3,0,5,3,0.002,5.0,4,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows the best performance so far, but we need to explore if increasing loop count with 5 layers could yield better results. The cadence pattern F/N/N (3) seems o","Based on the results, 5x2 with MLP 4x shows the best performance so far, but we need to explore if i" |
| 24 | +2026-03-23T01:38:22.362045,frug2_023,2.268142,3,0,5,3,0.002,5.0,4,0,,,,,,,"Based on the results, L=5x2 with MLP=4 shows the best performance so far, but we need to explore if increasing loop count while keeping layers optimal can improve further. The cadence pattern F/N/N (c","Based on the results, L=5x2 with MLP=4 shows the best performance so far, but we need to explore if " |
| 25 | +2026-03-23T01:42:00.656021,frug2_024,2.252595,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows strong performance around 2.23-2.24 BPB, and cadence 3 appears optimal. The 6x2 configuration slightly lags behind, suggesting we should explore if 5x2 can ","Based on the results, 5x2 with MLP 4x shows strong performance around 2.23-2.24 BPB, and cadence 3 a" |
| 26 | +2026-03-23T01:45:38.936954,frug2_025,2.237074,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows strong performance while maintaining good training stability. The cadence pattern F/N/N (cadence=3) appears optimal from previous findings. We should test i","Based on the results, 5x2 with MLP 4x shows strong performance while maintaining good training stabi" |
| 27 | +2026-03-23T01:49:17.361837,frug2_026,2.258998,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows strong performance around 2.23-2.24 BPB, while 6x2 with MLP 4x is slightly worse at ~2.25 BPB. The cadence pattern F/N/N (cadence=3) seems optimal based on ","Based on the results, 5x2 with MLP 4x shows strong performance around 2.23-2.24 BPB, while 6x2 with " |
| 28 | +2026-03-23T01:52:54.696448,frug2_027,2.24007,3,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows strong performance around 2.23-2.26 BPB, with cadence 3 being superior to 1 or 2. The optimal configuration appears to be in the 5-6 layers with 2 loops, an","Based on the results, 5x2 with MLP 4x shows strong performance around 2.23-2.26 BPB, with cadence 3 " |
| 29 | +2026-03-23T01:56:41.869660,frug2_028,2.22595,3,0,5,2,0.002,5.0,3,0,,,,,,,"Based on the results, 5x2 with MLP 4x shows strong performance close to the best, while 6x2 with MLP 4x is slightly worse but still competitive. The cadence pattern F/N/N (cadence=3) appears optimal f","Based on the results, 5x2 with MLP 4x shows strong performance close to the best, while 6x2 with MLP" |
| 30 | +2026-03-23T02:00:43.043706,frug2_029,2.214263,3,0,4,3,0.002,5.0,4,0,,,,,,,"Based on the results, L=4x3 with cadence 3 shows strong performance while maintaining lower complexity than L=5x3 or L=5x4. The MLP 4x configuration consistently outperforms MLP 3x, and cadence 3 appe","Based on the results, L=4x3 with cadence 3 shows strong performance while maintaining lower complexi" |
| 31 | +2026-03-23T02:06:00.113189,frug2_030,2.398726,1,0,5,2,0.002,5.0,4,0,,,,,,,"Based on the results, L=5x2 with MLP=4 shows the best performance so far, but we need to explore if increasing loop count with fixed layers or adjusting cadence patterns could yield better results. Th","Based on the results, L=5x2 with MLP=4 shows the best performance so far, but we need to explore if " |
| 32 | +2026-03-23T02:09:41.992206,frug2_031,2.233198,3,0,5,2,0.002,5.0,4,0,,,,,,,"The results show that 5x2 with MLP 4x performs best among the tested configs, with 6x2 showing slight degradation. The cadence pattern significantly impacts performance, with cadence 3 being superior ","The results show that 5x2 with MLP 4x performs best among the tested configs, with 6x2 showing sligh" |
0 commit comments