kfunca is a minimalist, high-performance GPU-based automatic differentiation framework. The operator scope is focused solely on multimodal transformers. Here are the supported features:
- GPU Launcher
- Caching Allocator
- Tensor Implementation
- Tensor Iterator
Basic operator:
- from_numpy/to_numpy
- add/sub/mul/div
- permute/contiguous/copy
- sum/mean
- sort/topk
- slice/view
- concat/split
Neural network operator:
- rms_norm
- causal attention
- embedding
- matmul
- qkv_linear
- fp32/64
- float16
- bfloat16
Welcome to reach out for collaboration: [email protected]