A high-performance inference engine project.
See motivation and MVP goals at #1
TL;DR of goals (README-driven development):
- High-performance: concurrent queries, 1M+ context per queries, highly-tuned kernels, fused kernels, SOTA CPU threadpool and kernels
- Embeddable: Single-binary, callable from C, C++, Rust, Python, ...
- Multi-hardware: Currently Cuda, OpenCL, Vulkan, WebGPU. Future HIP and Metal and why not DX12
- Multi-modality: Audio and Image input AND generation
- Maintainable and easy to extend
- Cryptography-inspired engineering practices: Lean4 formalization of complex state management
At the moment the project is still in its infancy, we present key differentiators that will hopefully snowball into an unique product in the landscape.
-
Embeddable, single dependency on drivers + libTorch C++. LibTorch dependency will be removed in the future.
-
Bidirectional Python, C, C++ integration:
- Nim can call Python and Python can call Nim
https://github.com/mratsim/tattletale/blob/9975f37/workspace/libtorch/tests/python_integration/test_tensor_bridge.nim#L25-L85 - Nim can call C/C++ and C/C++ can call Nim (by virtue of compiling to C/C++ as an intermediate language.
- Nim can call Python and Python can call Nim
-
Nim -> Cuda, OpenCL, Vulkan, WebGPU compiler implemented in Nim macros.
Build time or runtime portable code generation on any accelerator:
https://github.com/mratsim/tattletale/tree/dbb44dd/workspace/positron/src/codegen -
IntrusiveAttention, a PagedRadixTrie implemented on top of intrusive WAVL-tree for guaranteed worst-case latency.
No rebuilding, rehashing or tombstones like with hashmaps
~50ns+O(memory bandwidth) for prefix matching whatever the fan-out or the depth
allowing handling 100K+ cached requests with a single machine (for example for a router)
Partial formal verification in Lean4.\ -
EXL3 quant support, currently the highest quality quantization scheme using random Hadamard rotations, trellis and lattice codebooks.
- [WIP] Porting CuteDSL/Cutlass/TileLang to Nim and enabling them across hardware vendors
- No more libTorch dependencies, let's write my yet-another-tensor library in Nim
- Previous large: https://github.com/mratsim/Arraymancer
- Previous mini: https://github.com/mratsim/nim-julia-challenge/blob/master/src/tensor.nim
- Previous compiler-based: https://github.com/mratsim/laser/blob/master/laser/lux_compiler/lux_dsl.nim