-
Notifications
You must be signed in to change notification settings - Fork 695
Description
I believe we should try to improve our current memory management strategy for CubeCL backends. The goal is to reduce total memory usage by decreasing the amount of padding present in our memory pools and providing more information to our memory pools without requiring code changes for users.
The key idea here would be to add the concept of long-lived tensors and short-lived tensors. Parameters could be tagged as long-lived tensors, which would create a category of allocations with a fixed-size memory pool. These memory pools could have automatic deallocations based on padding in those memory pools, since they are only used by fixed tensors. Anyway, in most training and inference scenarios, model parameters are fixed, so deallocation would never happen. This optimization based on memory hints could be disabled for highly dynamic workloads to reduce the number of deallocations.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status