Skip to content

Commit d2ce78f

Browse files
TroyGardenfacebook-github-bot
authored andcommitted
fix a corner-case bug in memory snapshot uploading (#3504)
Summary: Fixed two corner case issues in the TorchRec benchmark utilities: 1. **Memory snapshot handling**: Added rank filtering for memory snapshot operations to ensure they only run on rank 0 or when `all_rank_traces` is enabled. This prevents redundant memory snapshots from being taken on all ranks, reducing overhead and storage requirements while still capturing the necessary profiling data. 2. **Shell script robustness**: Added file existence checks before loop iterations in the trace upload script. Previously, if no trace files or memory snapshot files were found, the script would fail silently or produce errors. Now it checks with `ls` first and only proceeds with the loop if files exist, preventing issues when the trace directory is empty or files don't match the expected patterns. Differential Revision: D86051540
1 parent 85dd1c6 commit d2ce78f

File tree

1 file changed

+2
-2
lines changed
  • torchrec/distributed/benchmark

1 file changed

+2
-2
lines changed

torchrec/distributed/benchmark/base.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -749,7 +749,7 @@ def _trace_handler(prof: torch.profiler.profile) -> None:
749749
f"{output_dir}/stacks-cuda-{name}.stacks", "self_cuda_time_total"
750750
)
751751

752-
if memory_snapshot:
752+
if memory_snapshot and (all_rank_traces or rank == 0):
753753
torch.cuda.empty_cache()
754754
torch.cuda.memory._record_memory_history(
755755
max_entries=MAX_NUM_OF_MEM_EVENTS_PER_SNAPSHOT
@@ -775,7 +775,7 @@ def _trace_handler(prof: torch.profiler.profile) -> None:
775775
else:
776776
torch.cuda.synchronize(rank)
777777

778-
if memory_snapshot:
778+
if memory_snapshot and (all_rank_traces or rank == 0):
779779
try:
780780
torch.cuda.memory._dump_snapshot(
781781
f"{output_dir}/memory-{name}-rank{rank}.pickle"

0 commit comments

Comments
 (0)