Memory Hint: Improved Memory Management

I believe we should try to improve our current memory management strategy for CubeCL backends. The goal is to reduce total memory usage by decreasing the amount of padding present in our memory pools and providing more information to our memory pools without requiring code changes for users.

The key idea here would be to add the concept of long-lived tensors and short-lived tensors. Parameters could be tagged as long-lived tensors, which would create a category of allocations with a fixed-size memory pool. These memory pools could have automatic deallocations based on padding in those memory pools, since they are only used by fixed tensors. Anyway, in most training and inference scenarios, model parameters are fixed, so deallocation would never happen. This optimization based on memory hints could be disabled for highly dynamic workloads to reduce the number of deallocations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory Hint: Improved Memory Management #3638

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory Hint: Improved Memory Management #3638

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions