Skip to content

RLsys-Foundation/torch_memory_saver

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Torch Memory Saver

A PyTorch library that allows tensor memory to be temporarily released and resumed later.

Please refer to sgl-project/sglang#2542 (comment) for details.

Examples and Features

Basic Example

# 1. For tensors that wants to be paused, create them within `region`
with torch_memory_saver.region():
    pauseable_tensor = torch.full((1_000_000_000,), 100, dtype=torch.uint8, device='cuda')

# 2. After `pause`, CUDA memory is released for those tensors.
# For example, check `nvidia-smi`'s memory usage to verify.
torch_memory_saver.pause()

# 3. After `resume`, CUDA memory is re-occupied for those tensors.
torch_memory_saver.resume()

During the pause, physical memory is released and virtual address is preserved. When resume, virtual address is kept unchanged, while physical memory is re-allocated

Multiple Tags

Please refer to sgl-project/sglang#7009 for details.

# 1. Create tensors with different tags
with torch_memory_saver.region(tag="type1"):
    tensor1 = torch.full((5_000_000_000,), 100, dtype=torch.uint8, device='cuda')

with torch_memory_saver.region(tag="type2"):
    tensor2 = torch.full((5_000_000_000,), 100, dtype=torch.uint8, device='cuda')

# 2. Pause and resume with different tags selectively
torch_memory_saver.pause("type1")
torch_memory_saver.pause("type2")

torch_memory_saver.resume("type2")
torch_memory_saver.resume("type1")

torch_memory_saver.pause("type1")
torch_memory_saver.resume("type1")

Release Memory in CUDA Graph

Not only does torch_memory_saver make tensors compatible with CUDA graph, but we can also release the memory held by CUDA graph (i.e. the intermediate tensors).

API: Change torch.cuda.graph(...) to torch_memory_saver.cuda_graph(...)

CPU Backup

By default, in order to save time, the content is thrown away. This is useful for, for example, KV cache that are to be staled, or model weights that are to be updated.

If you want the tensor content to be kept unchanged, use enable_cpu_backup.

with torch_memory_saver.region(enable_cpu_backup=True):
    tensor1 = torch.full((5_000_000_000,), 42, dtype=torch.uint8, device='cuda')

torch_memory_saver.pause()
torch_memory_saver.resume()

assert tensor1[0] == 42, "content is kept unchanged"

Hook Modes

There are two hook modes:

  • preload: Use LD_PRELOAD to hook CUDA's malloc and free API to change allocation behavior.
  • torch: Use torch's custom allocator API to change allocation behavior.

The mode can be chosen by:

torch_memory_saver.hook_mode = "torch"

Example of RL with CUDA Graph

Please refer to rl_example.py for details.

Development

make reinstall

You can use this command for local testing:

pytest /path/to/torch_memory_saver/test

Or this one to test a single case (e.g. the simple one here):

pytest /path/to/torch_memory_saver/test/test_examples.py::test_simple -s

About

Allow torch tensor memory to be released and resumed later

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 52.4%
  • C++ 41.6%
  • Shell 3.1%
  • C 2.0%
  • Makefile 0.9%