parallel branch length optimization #28

matsen · 2024-06-02T20:17:06Z

mmjohn

Cool! Is this the reason for the switch from torch.optim.Adam() to torch.optim.AdamW()? https://stackoverflow.com/questions/64621585/adamw-and-adam-with-weight-decay

matsen · 2024-06-03T21:31:51Z

Yes, exactly.

matsen added 4 commits June 2, 2024 10:51

storing dataset rather than loaders

39054b2

starmap working!

58a88d9

working in the burrito

063c5ec

make format

c3bef03

matsen linked an issue Jun 2, 2024 that may be closed by this pull request

parallel branch length optimization #27

Closed

matsen added 6 commits June 3, 2024 02:28

better names and type annots

2af11e1

moving split dataset

96c2595

more cleanup

a9dedbb

docs

4390c47

make format

46b1052

cleanup

d4f604c

matsen marked this pull request as ready for review June 3, 2024 10:01

matsen requested review from ksung25, willdumm and mmjohn June 3, 2024 10:01

matsen added 2 commits June 3, 2024 03:26

Actually saving model; avoiding Torch tensor problem

618ca6c

using AdamW optimizer

c31df84

mmjohn reviewed Jun 3, 2024

View reviewed changes

mmjohn approved these changes Jun 3, 2024

View reviewed changes

matsen added 6 commits June 4, 2024 05:16

dropping verbose due to lr_on_plateau

7ee3cd5

deep copying model

40fbd5f

walltime since start of execution time

1c1cdc2

make format

9007682

chore: Ignore UserWarning from torch.nn.modules.transformer

eec950f

make format

c257d14

matsen merged commit c48a52f into main Jun 5, 2024
1 check passed

matsen deleted the 27-parallel-bl branch June 5, 2024 11:56

Provide feedback