Skip to content

Commit

Permalink
Revert "disable offloaded optimizer state for live reco bc of potenta…
Browse files Browse the repository at this point in the history
…il memory leak"

This reverts commit 56a3499.
  • Loading branch information
samsja committed Nov 11, 2024
1 parent 55714cf commit f6bccd3
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions src/zeroband/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -468,8 +468,7 @@ def train(config: Config):
monitor.set_stage("outer_loop")

# todo we could skip this is we don't have live recovery enabled
# disable because of potential memory leak
# ckpt_manager.cache_inner_optimizer()
ckpt_manager.cache_inner_optimizer()

time_start_inner = time.perf_counter()
diloco.step(model=model, flag=training_progress.outer_step, num_effective_peers=num_effective_peers)
Expand Down

0 comments on commit f6bccd3

Please sign in to comment.