Skip to content

[FEA]: Matrix-view cache. #46

@tugrul512bit

Description

@tugrul512bit

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request?

Medium

Please provide a clear description of problem this feature solves

Some algorithms may access a tile more than once within its cuda block without knowing its the same tile. This should not load same tile twice. Or not even store twice. Maybe, an optional mechanism with a maximum dedicated smem-cache size can help redundancy-related issues.

For example, if I'm developing an open-world video-game where a player looks around and sees world, it needs tiles around the player (assuming 2D world map). When computing things for the player, the access to tiles could be optimized by actively caching by developer or automatically by cutile. Because, why not? If its multiplayer, then 8 players could be in same cluster and use multicasting too. (assuming its cloud-gaming with B200 gpu)

Feature Description

Read-caching, write-caching, maybe cluster-based multicasting automatically.

Describe your ideal solution

LRU, LFU, direct-mapped, even multiple layers (block L1 -> cluster L2 -> TMA), anything with an eviction works.

Describe any alternatives you have considered

I have looked at google with "cuda TMA cache" but it returned with 0 results.

Additional context

Maybe Blackwell architecture's tensor-memory can be used as a scratchpad memory for this instead of shared-memory?

Contributing Guidelines

  • I agree to follow cuTile Python's contributing guidelines
  • I have searched the open feature requests and have found no duplicates for this feature request

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions