Skip to content

mratsim/tattletale

Repository files navigation

Tattletale

A high-performance inference engine project.

See motivation and MVP goals at #1

TL;DR of goals (README-driven development):

  • High-performance: concurrent queries, 1M+ context per queries, highly-tuned kernels, fused kernels, SOTA CPU threadpool and kernels
  • Embeddable: Single-binary, callable from C, C++, Rust, Python, ...
  • Multi-hardware: Currently Cuda, OpenCL, Vulkan, WebGPU. Future HIP and Metal and why not DX12
  • Multi-modality: Audio and Image input AND generation
  • Maintainable and easy to extend
  • Cryptography-inspired engineering practices: Lean4 formalization of complex state management

Highlights

At the moment the project is still in its infancy, we present key differentiators that will hopefully snowball into an unique product in the landscape.

Future highlights

About

Stealth LLM inference engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors