- Austin, TX
- opentransactions.org
inference engine
FastMLX is a high performance production ready API to host MLX models.
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, and more.
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
ModelScope: bring the notion of Model-as-a-Service to life.
A high-throughput and memory-efficient inference and serving engine for LLMs
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
Distribute and run LLMs with a single file.