Hey there! Little nitpick about the last block of this docs page: https://docs.pytorch.org/tutorials/recipes/distributed_async_checkpoint_recipe.html
The checkpoint_future
variable is never written to in the last block.
Perhaps the intent was to have this instead?
checkpoint_future = dcp.async_save(state_dict, storage_writer=writer, checkpoint_id=f"{CHECKPOINT_DIR}_step{step}")
cc @LucasLLC @MeetVadakkanchery @mhorowitz @pradeepfn @ekr0 @haochengsong @Saiteja64