Skip to content

Memory Hint: Improved Memory Management #3638

@nathanielsimard

Description

@nathanielsimard

I believe we should try to improve our current memory management strategy for CubeCL backends. The goal is to reduce total memory usage by decreasing the amount of padding present in our memory pools and providing more information to our memory pools without requiring code changes for users.

The key idea here would be to add the concept of long-lived tensors and short-lived tensors. Parameters could be tagged as long-lived tensors, which would create a category of allocations with a fixed-size memory pool. These memory pools could have automatic deallocations based on padding in those memory pools, since they are only used by fixed tensors. Anyway, in most training and inference scenarios, model parameters are fixed, so deallocation would never happen. This optimization based on memory hints could be disabled for highly dynamic workloads to reduce the number of deallocations.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions