A curated list of battle-tested, production-proven open-source AI models, libraries, infrastructure, and developer tools. Only elite-tier projects make this list.
by Boring Dystopia Development
- 🧬 1. Core Frameworks & Libraries
- 🧠 2. Open Foundation Models
- ⚡ 3. Inference Engines & Serving
- 🤖 4. Agentic AI & Multi-Agent Systems
- 🔍 5. Retrieval-Augmented Generation (RAG) & Knowledge
- 🎨 6. Generative Media Tools
- 🛠️ 7. Training & Fine-tuning Ecosystem
- 📊 8. MLOps / LLMOps & Production
- 📈 9. Evaluation, Benchmarks & Datasets
- 🛡️ 10. AI Safety, Alignment & Interpretability
- 🧩 11. Specialized Domains
- 🖥️ 12. User Interfaces & Self-hosted Platforms
- 🧪 13. Developer Tools & Integrations
- 📚 14. Resources & Learning
Core libraries and frameworks used to build, train, and run AI and machine learning systems.
- PyTorch
- Dynamic computation graphs, Pythonic API, dominant in research and production. The current standard for most frontier AI work.
- TensorFlow
- End-to-end platform with excellent production deployment, TPU support, and large-scale serving tools.
- JAX
+ Flax
- High-performance numerical computing with composable transformations (JIT, vmap, grad). Rising favorite for research and scientific ML.
- NumPyro
- Probabilistic programming with NumPy powered by JAX for autograd and JIT compilation. Bayesian modeling and inference at scale.
- Keras
- High-level, beginner-friendly API that now runs on multiple backends (TensorFlow, JAX, PyTorch). Perfect for rapid experimentation.
- tinygrad
- Minimalist deep learning framework with tiny code footprint. The "you like pytorch? you like micrograd? you love tinygrad!" philosophy - simple yet powerful.
- PyTorch Geometric
- Library for deep learning on irregular input data such as graphs, point clouds, and manifolds. Part of the PyTorch ecosystem.
- Burn
- Next-generation deep learning framework in Rust. Backend-agnostic with CPU, GPU, WebAssembly support.
- Candle (Hugging Face)
- Minimalist ML framework for Rust. PyTorch-like API with focus on performance and simplicity.
- linfa
- Comprehensive Rust ML toolkit with classical algorithms. scikit-learn equivalent for Rust with clustering, regression, and preprocessing.
- Flux.jl
- 100% pure-Julia ML stack with lightweight abstractions on top of native GPU and AD support. Elegant, hackable, and fully integrated with Julia's scientific computing ecosystem.
- Transformers (Hugging Face)
- The de facto standard library for pretrained NLP models. 1M+ models, 250,000+ downloads/day. BERT, GPT, Llama, Qwen, and hundreds more.
- sentence-transformers
- Classic library for sentence and image embeddings.
- tokenizers (Hugging Face)
- Fast state-of-the-art tokenizers for training and inference.
- Pandas
- The gold standard for data analysis and manipulation in Python.
- Polars
- Blazing-fast DataFrame library (Rust backend) - modern alternative to pandas for large-scale workloads.
- cuDF
- GPU DataFrame library from RAPIDS. Accelerates pandas workflows on NVIDIA GPUs with zero code changes using cuDF.pandas accelerator mode.
- Modin
- Parallel pandas DataFrames. Scale pandas workflows by changing a single line of code - distributes data and computation automatically.
- Dask
- Parallel computing for big data - scales pandas/NumPy/scikit-learn to clusters.
- NumPy
- Fundamental array computing library that powers almost every AI stack.
- SciPy
- Scientific computing algorithms (optimization, linear algebra, statistics, signal processing).
- NetworkX
- Creation, manipulation, and study of complex networks. The foundational graph analysis library for Python data science.
- scikit-learn
- Industry-standard library for traditional machine learning (classification, regression, clustering, pipelines).
- XGBoost
- Scalable, high-performance gradient boosting library. Still dominates Kaggle and tabular competitions.
- LightGBM
- Microsoft's ultra-fast gradient boosting framework, optimized for speed and memory.
- CatBoost
- Gradient boosting that handles categorical features natively with great out-of-the-box performance.
- sktime
- Unified framework for machine learning with time series. Scikit-learn compatible API for forecasting, classification, clustering, and anomaly detection.
- StatsForecast
- Lightning-fast statistical forecasting with ARIMA, ETS, CES, and Theta models. Optimized for high-performance time series workloads.
- Optuna
- Modern, define-by-run hyperparameter optimization with pruning and visualizations. Extremely popular in 2026.
- AutoGluon
- AWS AutoML toolkit for tabular, image, text, and multimodal data - state-of-the-art with almost zero code.
- FLAML
- Microsoft's fast & lightweight AutoML focused on efficiency and low compute.
- AutoKeras
- Neural architecture search on top of Keras.
- Hugging Face Accelerate
- Simple API to make training scripts run on any hardware (multi-GPU, TPU, mixed precision) with minimal code changes.
- DeepSpeed
- Microsoft's deep learning optimization library for extreme-scale training (ZeRO, offloading, MoE).
- Transformers
- Library of pretrained transformer models and utilities for text, vision, audio, and multimodal training and inference.
- FlashAttention
- Fast exact attention kernels that reduce memory usage and accelerate transformer training and inference.
- xFormers
- Optimized transformer building blocks and attention operators for PyTorch.
- PyTorch Lightning
- High-level wrapper for PyTorch that removes boilerplate and adds best practices.
- ONNX Runtime
- High-performance inference and training for ONNX models across hardware.
- einops
- Flexible, powerful tensor operations for readable and reliable code. Supports PyTorch, JAX, TensorFlow, NumPy, MLX.
- safetensors
- Simple, safe way to store and distribute tensors. Fast, secure alternative to pickle for model serialization.
- torchmetrics
- Machine learning metrics for distributed, scalable PyTorch applications. 80+ metrics with built-in distributed synchronization.
- torchao
- PyTorch native quantization and sparsity for training and inference. Drop-in optimizations for production deployment.
- SHAP
- Game theoretic approach to explain the output of any machine learning model. Industry standard for model interpretability.
Pretrained language, multimodal, speech, and video models with publicly available weights.
- Qwen3.6-Plus (Alibaba)
- Latest flagship series released April 2026 with 1M context window, agentic coding performance competitive with Claude 4.5 Opus, and enhanced multimodal capabilities.
- Gemma 4 (Google)
- Released April 2026 in four sizes (E2B, E4B, 26B MoE, 31B Dense). First major update in a year with Apache 2.0 license, complex logic, and agentic workflows.
- Kimi K2.5 (Moonshot AI)
- Frontier open-weight MoE model with 256K context, strong coding and reasoning performance, and native multimodal + tool-use support for agentic workflows.
- Phi-4 (Microsoft)
- Small but highly capable models optimized for reasoning, edge devices, and on-device inference. Includes Phi-4-reasoning variants with thinking capabilities.
- GLM-5 (Zhipu AI)
- Strong open model line with solid coding, reasoning, and agentic-task performance.
- OLMo 2 (Allen AI)
- Fully open-source LLMs (1B–32B) with complete transparency: models, data, training code, and logs. Designed by scientists, for scientists.
- Llama 4 (Meta)
- First native multimodal MoE open-source models (Scout: 10M context, Maverick: 400B+ params). Released April 2025 with enterprise-grade capabilities.
- DeepSeek-Coder-V2 / R1-Coder
- Best-in-class open coding model (236B MoE). Outperforms closed models on many code benchmarks.
- Qwen3-Coder-Next (Alibaba)
- Leading open coding model. Strong Pareto frontier for cost-effective agent deployment.
- Qwen3-VL (Alibaba)
- Latest flagship VLM with native 256K context (expandable to 1M), visual agent capabilities, 3D grounding, and superior multimodal reasoning. Major leap over Qwen2.5-VL.
- GLM-4.5V / GLM-4.1V-Thinking (Zhipu AI)
- Strong multimodal reasoning with scalable reinforcement learning. Compares favorably with Gemini-2.5-Flash on benchmarks.
- MiniCPM-V 2.6
- Handles images up to 1.8M pixels with top-tier OCR performance. Excellent for on-device deployment.
- Gemma 4 (Google)
- Multimodal model supporting vision-language input, optimized for efficiency, complex logic, and on-device use.
- Whisper (OpenAI → community forks)
- The gold-standard open speech-to-text model. Massive community fine-tunes available.
- OuteTTS / CosyVoice 2
- High-quality open TTS with natural prosody and multilingual support.
- Fish Speech / StyleTTS 2
- Zero-shot TTS with excellent voice cloning. Extremely popular in 2026.
- MusicGen / AudioCraft (Meta)
- Open music and audio generation models.
- VibeVoice (Microsoft)
- Open-source frontier voice AI with expressive, longform conversational speech synthesis. 7B parameter TTS with streaming support.
- Chatterbox (Resemble AI)
- State-of-the-art open TTS family with 350M parameter Turbo variant. Single-step generation with native paralinguistic tags for realistic dialogue.
- Dia (Nari Labs)
- 1.6B parameter TTS generating ultra-realistic dialogue in one pass with nonverbal communications (laughter, coughing). Emotion and tone control via audio conditioning.
- Step-Audio (StepFun)
- 130B-parameter production-ready audio LLM for intelligent speech interaction. Supports multilingual conversations (Chinese, English, Japanese), emotional tones, regional dialects (Cantonese, Sichuanese), adjustable speech rates, and prosodic styles including rap. Apache 2.0 licensed.
- Voxtral TTS (Mistral)
- 4B parameter state-of-the-art TTS with zero-shot voice cloning, 9-language support, and ~90ms time-to-first-audio for voice agents.
- CogVideoX (Zhipu AI / community)
- High-quality open text-to-video model (5B-12B).
- Mochi 1 (Genmo)
- 10B open video model with impressive motion and consistency.
Inference runtimes, serving systems, and optimization tools for running models locally or in production.
- llama.cpp
- Pure C/C++ inference engine with GGUF format support. The gold standard for CPU/GPU/Apple Silicon on-device running. Includes llama-server for OpenAI-compatible API.
- Ollama
- Dead-simple local LLM runner with a one-line install, model registry, and OpenAI-compatible API.
- MLX
(Apple) - High-performance array framework + LLM inference optimized for Apple Silicon.
- MLC-LLM
- Deployment engine that compiles and runs LLMs across browsers, mobile devices, and local hardware.
- WebLLM
- High-performance in-browser LLM inference engine. Runs models directly in the browser with WebGPU acceleration.
- llama-cpp-python
- Official Python bindings for llama.cpp.
- KoboldCpp
- User-friendly llama.cpp fork focused on role-playing and creative writing.
- llm-d
- Kubernetes-native distributed LLM inference framework. Donated to CNCF by RedHat, Google, and IBM. Intelligent scheduling, KV-cache optimization, and state-of-the-art performance across accelerators.
- LMDeploy
- Toolkit for compressing, deploying, and serving LLMs from OpenMMLab. 4-bit inference with 2.4x higher performance than FP16, distributed multi-model serving across machines.
- vLLM
- State-of-the-art serving engine with PagedAttention and continuous batching. Currently the fastest production-grade LLM server.
- SGLang
- Next-gen serving framework with RadixAttention. Powers xAI's production workloads at 100K+ GPUs scale.
- TensorRT-LLM
- NVIDIA's official high-performance inference backend.
- Aphrodite Engine
- vLLM fork optimized for role-play and creative writing.
- Triton Inference Server
- NVIDIA's production-grade open-source inference serving software. Supports multiple frameworks (TensorRT, PyTorch, ONNX) with optimized cloud and edge deployment.
- mistral.rs
- Fast, flexible Rust-native LLM inference engine built on Candle. Supports text, vision, audio, image generation, and embeddings with hardware-aware auto-tuning.
- KTransformers
- Flexible framework for heterogeneous CPU-GPU LLM inference and fine-tuning. Enables running large MoE models by offloading experts to CPU with BF16/FP8 precision support.
- llamafile
- Mozilla's single-file distributable LLM solution. Bundle model weights, inference engine, and runtime into one portable executable that runs on six OSes without installation.
- Xinference
- Unified, production-ready inference API for LLMs, speech, and multimodal models. Drop-in GPT replacement with single-line code changes. Supports thousands of models with auto-batching and distributed inference.
- LightLLM
- Pure Python-based LLM inference and serving framework with lightweight design, easy extensibility, and high-speed performance. Integrates optimizations from FasterTransformer, TGI, vLLM, and SGLang.
- TabbyAPI
- FastAPI-based API server for ExLlamaV2/V3 backends. OpenAI-compatible API with support for model loading/unloading, embeddings, speculative decoding, multi-LoRA, and streaming.
- GGUF
(part of llama.cpp) - Modern quantized format that powers most local inference.
- bitsandbytes
- 8-bit and 4-bit optimizers + quantization.
- ExLlamaV2
- Highly optimized CUDA kernels for 4-bit/8-bit inference.
- Optimum
- Hardware-specific acceleration and quantization.
Frameworks and platforms for building agent-based systems and multi-agent workflows.
- LangGraph
- Stateful, controllable agent orchestration.
- CrewAI
- Role-based agent framework.
- AutoGen (AG2)
- Flexible multi-agent conversation framework.
- DSPy
- Framework for programming language model pipelines with modules, optimizers, and evaluation loops.
- Semantic Kernel
- SDK for building and orchestrating AI agents and workflows across multiple programming languages.
- smolagents
- Lightweight agent framework centered on tool use and code-executing workflows.
- LangChain
- Foundational library for agents, chains, and memory.
- Hermes Agent (NousResearch)
- The agent that grows with you. Autonomous server-side agent with persistent memory that learns and improves over time.
- Agno
- Build, run, and manage agentic software at scale. High-performance framework for multi-agent systems with memory, knowledge, and tools.
- Upsonic
- Agent framework for fintech and banking with built-in MCP support, guardrails, and tool server architecture.
- VoltAgent
- TypeScript-first AI agent engineering platform with memory, RAG, workflows, MCP integration, and voice support.
- MetaGPT
- Simulates an entire "AI software company".
- CAMEL
- First and best multi-agent framework for building scalable agent systems. Apache 2.0 licensed with extensive tooling for agent communication and task automation.
- Swarms
- Bleeding-edge enterprise multi-agent orchestration.
- Llama-Agents
- Async-first multi-agent system.
- Mastra
- TypeScript-first agent framework with built-in RAG, workflows, tool integrations, observability and observational memory.
- Deer-Flow (ByteDance)
- Open-source long-horizon SuperAgent harness that researches, codes, and creates. Handles tasks from minutes to hours with sandboxes, memories, tools, skills, subagents, and message gateway.
- OpenAI Agents SDK
- Production-ready lightweight framework for multi-agent workflows. The evolution of Swarm with enhanced orchestration capabilities and enterprise-grade features.
- AgentScope
- Alibaba's production-ready multi-agent framework with 23K+ stars. Features built-in MCP and A2A support, message hub for flexible orchestration, and AgentScope Runtime for production deployment.
- Microsoft Agent Framework
- Microsoft's official framework combining AutoGen's agent abstractions with Semantic Kernel's enterprise features. Supports Python and .NET with graph-based workflows.
- Agency Swarm
- Reliable multi-agent orchestration framework built on top of the OpenAI Assistants API with organizational structure modeling.
- OpenHands (ex-OpenDevin)
- Full-featured open-source AI software engineer.
- Goose
- Extensible on-machine AI agent for development tasks.
- OpenCode
- Terminal-native autonomous coding agent.
- Aider
- Command-line pair-programming agent.
- Pi (badlogic)
- Terminal coding agent with hash-anchored edits, LSP integration, subagents, MCP support, and package ecosystem.
- Mistral-Vibe (Mistral)
- Minimal CLI coding agent by Mistral. Lightweight, fast, and designed for local development workflows.
- Nanocoder (Nano-Collective)
- Beautiful local-first coding agent running in your terminal. Built for privacy and control with support for multiple AI providers via OpenRouter.
- Gemini CLI (Google)
- Open-source AI agent that brings Gemini's power directly into your terminal. Supports code generation, shell execution, and file editing with full Apache 2.0 licensing.
- Langflow
- Visual low-code platform for agentic workflows.
- Dify
- Production-ready agentic workflow platform.
- OWL (camel-ai/owl)
- Advanced multi-agent collaboration system.
- AI-Scientist-v2 (SakanaAI)
- Workshop-level automated scientific discovery via agentic tree search. Generates novel research ideas, runs experiments, and writes papers.
- PraisonAI
- 24/7 AI employee team for automating complex challenges. Low-code multi-agent framework with handoffs, guardrails, memory, RAG, and 100+ LLM providers.
- Agent-S (Simular AI)
- Open agentic framework that uses computers like a human. SOTA on OSWorld benchmark (72.6%) for GUI automation and computer control.
- Letta (ex-MemGPT)
- Platform for building stateful agents with advanced memory that learn and self-improve over time.
- Mem0
- Universal memory layer for AI agents. Persistent, multi-session memory across models and environments.
- Hindsight
- State-of-the-art long-term memory for AI agents by Vectorize. Fully self-hosted, MIT-licensed, with integrations for LangChain, CrewAI, LlamaIndex, Vercel AI SDK, and more.
Retrieval systems, vector databases, embedding models, and related tooling for RAG pipelines.
- Chroma
- Most popular open-source embedding database.
- Qdrant
- High-performance vector search engine in Rust.
- Weaviate
- GraphQL-native vector search engine.
- Milvus
- Scalable cloud-native vector database.
- Faiss
- Similarity search and clustering library for dense vectors with CPU and GPU implementations.
- LanceDB
- Serverless vector DB optimized for multimodal data.
- Vespa
- AI + Data platform with hybrid search (vector + keyword) and real-time indexing at scale. Battle-tested serving billions of queries daily.
- pgvector
- PostgreSQL extension for vector similarity search.
- BGE (FlagEmbedding)
- BAAI's best-in-class embedding family.
- E5 (Microsoft)
- High-performance text embeddings for retrieval.
- MTEB
- Massive Text Embedding Benchmark covering 1000+ languages and diverse tasks. The industry standard for evaluating and comparing embedding models.
- LlamaIndex
- Full-featured RAG pipeline with advanced indexing.
- Haystack
- End-to-end NLP and RAG framework.
- RAGFlow
- Deep-document-understanding RAG engine.
- GraphRAG (Microsoft)
- Knowledge-graph-based RAG.
- Docling
- Document processing toolkit for turning PDFs and other files into structured data for GenAI workflows.
- Unstructured
- Best-in-class document preprocessing.
- MinerU
- High-accuracy document parsing for LLM and RAG workflows. Converts PDFs, Word, PPTs, and images into structured Markdown/JSON with VLM+OCR dual engine.
- ColPali / ColQwen
- Vision-language models for document retrieval.
- LightRAG
- Graph-based RAG with dual-level retrieval system. Simple and fast with comprehensive knowledge discovery (EMNLP 2025).
- RAG-Anything
- All-in-One Multimodal RAG system for seamless processing of text, images, tables, and equations. Built on LightRAG.
- txtai
- All-in-one AI framework for semantic search, LLM orchestration and language model workflows. Embeddings database with customizable pipelines.
- Infinity
- High-throughput, low-latency serving engine for text-embeddings, reranking, CLIP, and ColPali. OpenAI-compatible API.
- Crawl4AI
- LLM-friendly web crawler that turns websites into clean Markdown for RAG and agentic workflows.
- Lightpanda
- Machine-first headless browser in Zig; rendering-free and ultra-lightweight for AI agent browsing.
- Paperless-AI
- Automated document analyzer for Paperless-ngx with RAG-powered semantic search across your document archive.
- Firecrawl
- Web Data API for AI - search, scrape, and interact with the web at scale. Clean markdown/JSON output with proxy rotation and JS-blocking handled automatically.
Open-source models and applications for image, video, audio, and 3D generation and editing.
- ComfyUI
- Node-based visual workflow editor for Stable Diffusion, FLUX, etc.
- Stable Diffusion WebUI Forge - Neo
- Actively maintained Forge-based Stable Diffusion web UI with the familiar extension-driven workflow.
- Fooocus
- Midjourney-style UI with beautiful out-of-the-box results.
- Diffusers
- PyTorch library for diffusion pipelines spanning image, video, and audio generation.
- InvokeAI
- Full-featured creative studio.
- PowerPaint (OpenMMLab)
- Versatile image inpainting model supporting text-guided inpainting, object removal, and outpainting (ECCV 2024).
- Wan2.2 (Alibaba)
- Leading open Mixture-of-Experts text-to-video model.
- HunyuanVideo (Tencent)
- 13B-parameter systematic video generation framework. Leading quality among open models.
- SkyReels V2/V3 (Skywork)
- First open-source infinite-length film generative model using AutoRegressive Diffusion-Forcing.
- Mochi 1 (Genmo)
- 10B-parameter open video model.
- LTX-Video (Lightricks)
- Fast native 4K video generation.
- Stable Video Diffusion (Stability AI)
- Official image-to-video and text-to-video implementation within Stability AI's generative models repository.
- Helios (PKU-YuanGroup)
- Efficient long-video generation framework with 24GB VRAM support for up to 10,000 frames (5+ minutes) and 1280×768 resolution. Apache 2.0 licensed.
- AudioCraft / MusicGen (Meta)
- Controllable text-to-music and audio models.
- ACE-Step 1.5
- Local-first music generation model with broad hardware support across Mac, AMD, Intel, and CUDA devices.
- Fish Speech
- Zero-shot TTS and voice cloning.
- CosyVoice 2
- Natural multilingual TTS with emotional control.
- OuteTTS
- High-quality open TTS.
- Amphion
- Comprehensive toolkit for Audio, Music, and Speech Generation (9.7K stars).
- Hunyuan3D-2 (Tencent)
- State-of-the-art open image-to-3D and text-to-3D.
- Trellis (Microsoft)
- Structured 3D latents for high-quality generation.
- gsplat (3D Gaussian Splatting tools)
- High-performance 3D Gaussian Splatting library.
- LichtFeld-Studio
- Native application for training, editing, and exporting 3D Gaussian Splatting scenes with MCMC optimization and timelapse generation. GPL-3.0 licensed.
Tools for model training, fine-tuning, synthetic data generation, and distributed training.
- LLaMA-Factory
- One-stop unified framework for SFT, DPO, ORPO, KTO with web UI.
- Axolotl
- YAML-driven full pipeline for SFT, DPO, GRPO.
- ms-swift
- Unified training framework for 600+ LLMs and 300+ MLLMs with CPT/SFT/DPO/GRPO (AAAI 2025).
- Unsloth
- 2× faster, 70% less memory fine-tuning.
- LitGPT
- Clean from-scratch implementations of 20+ LLMs.
- LLM Foundry
- Databricks' training framework for composable LLM training with StreamingDataset and Composer.
- torchtune
- PyTorch-native library for post-training, fine-tuning, and experimentation with LLMs.
- TRL (Transformers Reinforcement Learning)
- Official library for RLHF, SFT, DPO, ORPO.
- verl
- Volcano Engine Reinforcement Learning for LLMs with PPO, GRPO, REINFORCE++, DAPO (EuroSys 2025).
- NeMo-RL
- Scalable toolkit for efficient model reinforcement with DTensor and Megatron backends.
- PEFT (Parameter-Efficient Fine-Tuning)
- Official library with LoRA, QLoRA, DoRA, etc.
- Liger Kernel
- Ultra-fast custom kernels for training speedup.
- MergeKit
- Advanced model merging tools.
- distilabel
- End-to-end pipeline for synthetic instruction data.
- Data-Juicer
- High-performance data processing for LLM training.
- Argilla
- Open-source data labeling + synthetic data platform.
- SDV (Synthetic Data Vault)
- High-fidelity tabular and relational synthetic data.
- DeepSpeed
- Extreme-scale training optimizations.
- Colossal-AI
- Unified system for 100B+ models.
- Megatron-LM
- Distributed training framework and reference codebase for large transformer models at scale.
- Composer
- MosaicML's PyTorch library for scalable, efficient neural network training with algorithmic speedups.
- Ray Train
- Scalable distributed training.
Tooling for tracking, deploying, monitoring, and operating AI systems in production.
- MLflow
- End-to-end open platform for the ML/LLM lifecycle.
- DVC (Data Version Control)
- Git-like versioning for data and models.
- ClearML
- Open-source platform for experiment tracking, orchestration, data management, and model serving.
- Weights & Biases Weave
- Open-source tracing and experiment tracking.
- Feast
- Open source feature store for ML. Manages offline/online feature storage with point-in-time correctness to prevent data leakage. Apache 2.0 licensed.
- BentoML
- Unified framework to build, ship, and scale AI apps.
- Ray Serve
- Scalable model serving library.
- ZenML
- Pipeline and orchestration framework for taking ML and LLM systems from development to production.
- Kubeflow
- Kubernetes-native ML/LLM platform.
- KServe
- Kubernetes-based model serving.
- Metaflow
- Netflix's ML platform for building and managing real-world AI systems. Powers thousands of projects at Netflix, Amazon, and DoorDash. Apache 2.0 licensed.
- Flyte
- Kubernetes-native workflow orchestration platform for AI/ML pipelines. Dynamic, resilient orchestration with strong type safety and reproducibility. Used by Lyft, Spotify, and Gojek. Apache 2.0 licensed.
- Langfuse
- #1 open-source LLM observability platform.
- Phoenix (Arize)
- AI observability & evaluation platform.
- Evidently
- ML & LLM monitoring framework.
- Opik (Comet)
- Production-ready LLM evaluation platform.
- LiteLLM
- AI Gateway to call 100+ LLM APIs in OpenAI format with unified cost tracking, guardrails, load balancing, and logging.
- OpenLIT
- OpenTelemetry-native LLM observability platform with GPU monitoring, evaluations, prompt management, and guardrails.
- OpenLLMetry (Traceloop)
- Open-source observability for GenAI/LLM applications based on OpenTelemetry with 25+ integration backends.
- Agenta
- Open-source LLMOps platform combining prompt playground, prompt management, LLM evaluation, and observability.
- Helicone
- Open-source LLM observability with request logging, caching, rate limiting, and cost analytics.
- Giskard
- Open-source evaluation and testing library for LLM agents. Red teaming, vulnerability scanning, RAG evaluation, and safety testing with modular architecture. Apache 2.0 licensed.
- Portkey Gateway
- Blazing fast AI Gateway to route 200+ LLMs with unified API. Integrated guardrails, load balancing, fallbacks, and cost tracking. MIT licensed.
- NVIDIA NeMo Guardrails
- Programmable guardrails toolkit for LLM-based conversational systems. Uses Colang to define dialog flows with input/output rails, jailbreak detection, fact-checking, and hallucination detection. Apache 2.0 licensed.
- Guardrails AI
- Python framework for adding input/output guardrails to LLM applications. Detects and mitigates risks like PII leakage, toxic language, competitor mentions, with 50+ validators in Guardrails Hub. Apache 2.0 licensed.
- LLM Guard
- Comprehensive security toolkit for LLM interactions with input/output scanners for prompt injection, PII anonymization, toxic content, secrets detection, and adversarial attack prevention. MIT licensed.
- LlamaGuard (Meta)
- Open safety classifier models.
- Garak
- LLM vulnerability scanner.
- Promptfoo
- LLM testing and red-teaming framework.
Benchmarks, evaluation frameworks, datasets, and supporting tools for model assessment.
- lm-evaluation-harness (EleutherAI)
- De-facto standard for generative model evaluation.
- HELM (Stanford)
- Holistic Evaluation of Language Models.
- SWE-bench
- Evaluates LLMs on real-world GitHub issues from 15+ Python repositories.
- GAIA - Real-world multi-step agentic benchmark.
- OpenCompass
- Evaluation platform for benchmarking language and multimodal models across large benchmark suites.
- MLPerf Inference
- Industry-standard ML inference benchmarks with reference implementations for AI accelerators.
- SWE-rebench (Nebius) - Continuously updated benchmark with 21,000+ real-world SWE tasks for evaluating agentic LLMs. Decontaminated, mined from GitHub.
- AgentBench (THUDM)
- Comprehensive benchmark to evaluate LLMs as agents across 8 diverse environments including household, web shopping, OS interaction, and database tasks. ICLR 2024. Apache 2.0 licensed.
- DeepEval
- The "Pytest for LLMs".
- Inspect AI
- Framework for large language model evaluations from the UK AI Security Institute.
- RAGAs
- End-to-end RAG evaluation framework.
- Lighteval
- Evaluation toolkit for LLMs across multiple backends with reusable tasks, metrics, and result tracking.
- Hugging Face Evaluate
- Standardized evaluation metrics.
- OpenAI Evals
- Framework for evaluating LLMs and LLM systems with an open-source registry of 100+ community-contributed benchmarks. MIT licensed.
- Hugging Face Datasets
- Largest open repository of datasets.
- FineWeb / FineWeb-2 (Hugging Face) - Curated 15T+ token web dataset for pre-training.
- OSWorld
- Multimodal agent benchmark dataset.
Tools for alignment, interpretability, safety evaluation, and adversarial testing.
- Inspect AI
- Framework for large language model evaluations from the UK AI Safety Institute. Systematic capability and safety assessments with built-in scaffolding for multi-turn dialog, tool use, and adversarial testing. MIT licensed.
- DeepEval
- LLM evaluation framework with built-in safety metrics including hallucination detection, bias detection, toxicity evaluation, and prompt alignment checking. Apache 2.0 licensed.
- Safe-RLHF
- Safe reinforcement learning from human feedback.
- Alignment Handbook
- Complete recipes for full-stack alignment.
- OpenRLHF
- High-performance distributed RLHF framework.
- TransformerLens
- Gold-standard for mechanistic interpretability.
- SAELens
- Sparse autoencoders for interpretable features.
- Captum
- PyTorch's official interpretability library.
- SHAP
- Game theoretic approach to explain the output of any machine learning model. Industry standard for model interpretability.
- XAI
- eXplainability toolbox for machine learning with bias evaluation and production monitoring tools.
- AI Fairness 360
- Comprehensive toolkit for detecting, understanding, and mitigating unwanted algorithmic bias in datasets and ML models.
- Garak
- Automated LLM vulnerability scanner.
- Promptfoo
- Systematic prompt testing and red-teaming.
- LLM Guard
- Input/output scanner for LLMs.
- Adversarial Robustness Toolbox
- Python library for machine learning security (evasion, poisoning, extraction, inference attacks).
- DeepTeam
- Framework to red team LLMs and LLM systems.
- Boltz
- Open-source biomolecular interaction prediction models. Boltz-1 was the first fully open source model to approach AlphaFold3 accuracy; Boltz-2 adds binding affinity prediction for drug discovery. MIT licensed.
- OpenFold
- Trainable PyTorch reproduction of AlphaFold2. Complete open-source pipeline for protein structure prediction with competitive accuracy to the original. Apache 2.0 licensed.
- MONAI
- Medical Open Network for AI. End-to-end framework for healthcare imaging with state-of-the-art, production-ready training workflows. Apache 2.0 licensed.
- Unity ML-Agents
- Toolkit for training intelligent agents in games and simulations using deep reinforcement learning. Enables NPC behavior control, automated testing, and game design evaluation. Apache 2.0 licensed.
- OpenSpiel
- Collection of environments and algorithms for research in general reinforcement learning and search/planning in games from Google DeepMind. Apache 2.0 licensed.
- OpenBB
- Financial data platform for analysts, quants and AI agents. Open-source investment research infrastructure with extensive data integrations. AGPL-3.0 licensed.
- FinGPT
- Open-source financial large language models. Democratizing financial AI with data-centric training pipeline and multiple model releases for trading, analysis, and robo-advising. MIT licensed.
- FinRL
- Financial reinforcement learning framework for quantitative trading. Deep RL library for stock trading, portfolio allocation, and market execution with pre-built environments and benchmarks. MIT licensed.
- OpenCV
- World's most widely used computer vision library.
- Ultralytics YOLO
- State-of-the-art real-time object detection.
- Detectron2
- High-performance object detection library.
- SAM 2
- Promptable image and video segmentation model with released checkpoints and training code.
- Kornia
- Differentiable computer vision library.
- MediaPipe
- Cross-platform multimodal pipelines.
- Stable-Baselines3
- Production-ready RL algorithms.
- Isaac Lab
- GPU-accelerated robot learning framework.
- MuJoCo
- General-purpose physics simulator for robotics, biomechanics, and ML research. High-fidelity contact dynamics with native Python and C++ bindings. Apache 2.0 licensed.
- Gymnasium (ex-OpenAI Gym)
- Standard RL environment API.
- Time Series Library (TSLib)
- Comprehensive benchmark for time-series models.
- Chronos (Amazon)
- Pretrained foundation models for time-series forecasting.
- Darts
- Easy-to-use time-series forecasting library.
- AutoTS
- Automated time series forecasting with broad model selection, ensembling, anomaly detection, and holiday effects. Designed for production deployment with minimal setup.
- TensorFlow Lite
- Lightweight on-device ML.
- ONNX Runtime
- Cross-platform high-performance inference.
- ExecuTorch
- PyTorch runtime and toolchain for deploying AI models on mobile, embedded, and edge devices.
- OpenVINO
- Intel's toolkit for edge deployment.
- MicroTVM (Apache TVM)
- Compiler stack for microcontrollers.
- OpenContracts
- Self-hosted document annotation platform for legal AI. Semantic search, contract analysis, version control, and MCP integration for building legal knowledge bases. AGPL-3.0 licensed.
- OpenClaw
- Local-first personal AI assistant with multi-channel integrations and full agentic task execution.
- Open WebUI
- Most popular self-hosted ChatGPT-style interface.
- text-generation-webui
- Web UI for running local LLMs with multiple backends, extensions, and model formats.
- LobeChat
- Sleek modern chat UI.
- LibreChat
- Feature-packed multi-LLM interface.
- HuggingChat (self-hosted)
- Official open-source codebase for HuggingChat.
- Khoj
- Self-hostable personal AI assistant for search, chat, automation, and workflows over local and web data.
- Newelle
- GNOME/Linux desktop virtual assistant with integrated file editor, global hotkeys, and profile manager.
- NextChat
- Light and fast AI assistant supporting Web, iOS, macOS, Android, Linux, and Windows. One-click deploy with multi-model support. MIT licensed.
- big-AGI
- AI suite for power users with multi-model "Beam" chats, AI personas, voice, text-to-image, code execution, and PDF import. MIT licensed.
- Leon
- Your open-source personal assistant. Built around tools, context, memory, and agentic execution. Self-hosted, privacy-focused, and extensible. MIT licensed.
- AnythingLLM
- All-in-one RAG + agents platform.
- Dify
- Complete AI application platform with visual builder.
- Langflow
- Visual low-code platform for LangChain flows.
- Flowise
- Drag-and-drop LLM app builder.
- LocalAI
- Open-source AI engine running LLMs, vision, voice, image, and video models on any hardware. Self-hosted OpenAI-compatible API. MIT licensed.
- Onyx
- Full-featured AI platform with Chat, RAG, Agents, and Actions. 40+ document connectors and every LLM support. MIT licensed (Community Edition).
- AI Chatbot Framework
- Open-source, self-hosted DIY chatbot building platform with visual conversation builder and NLU capabilities. MIT licensed.
- Jan
- Local-first AI app framework.
- SillyTavern
- Highly customizable role-playing frontend.
- Chatbox
- Powerful desktop AI client for ChatGPT, Claude, and other LLMs. Cross-platform with modern UI. GPLv3 licensed (Community Edition).
- Maid
- Free and open-source Android app for interfacing with llama.cpp models locally and remote APIs (Anthropic, DeepSeek, Mistral, Ollama, OpenAI). MIT licensed.
- Pipecat
- Open-source framework for voice and multimodal conversational AI. Build real-time voice agents with support for speech-to-text, LLMs, text-to-speech, and live video. BSD-2-Clause licensed.
- Agent Chat UI
- Web app for interacting with any LangGraph agent (Python & TypeScript) via a chat interface. Stream messages, handle interruptions, and view agent state. MIT licensed.
- Continue
- Open-source AI coding autopilot for VS Code & JetBrains.
- Tabby
- Self-hosted AI coding assistant.
- Cline
- Open-source IDE coding agent that can edit files, run commands, and use tools with user approval.
- Open Interpreter
- Lets LLMs run code locally.
- Roo Code
- Open-source editor-based coding agent with multiple modes and tool integrations.
- Aider
- Terminal-based AI pair programmer.
- llama.vim
- Local LLM-powered code completion plugin for Vim/Neovim using llama.cpp. Fast, privacy-first, no API key needed.
- CodeCompanion.nvim
- AI-powered coding assistant for Neovim. Inline code generation, chat, actions, and tool use with support for multiple LLM providers.
- Continue VS Code / JetBrains
- Most installed open-source AI extension.
- Jupyter AI
- Chat and code generation inside notebooks.
- Assistant UI
- React/TypeScript library for building production-grade AI chat interfaces. Drop-in components for streaming messages, tool calls, and multi-modal inputs.
- Promptfoo
- Systematic LLM testing framework.
- DeepEval
- LLM unit-testing framework.
- Garak
- LLM vulnerability scanner.
- Phoenix (Arize)
- AI observability for development.
- Papers with Code - Definitive database linking papers to open code and datasets.
- Hugging Face Papers - Daily-updated feed of the latest arXiv papers with open weights.
- Open LLM Leaderboard (Hugging Face) - Real-time ranking of open models.
- Hugging Face Discussions - Largest open AI forum.
- r/LocalLLaMA - Go-to subreddit for local/open-source LLM topics.
- Hugging Face Course - Free hands-on courses using only open models.
- Fast.ai
- Legendary practical deep learning course.
- LangChain Academy - Free courses on agents and RAG.
- TensorFlow Tutorials
- Official guides for beginners to advanced users.
- Hugging Face Transformers Notebooks
- Run Transformers, Datasets, and more in Colab.
Contributions are highly welcome! Please read the CONTRIBUTING.md for guidelines (quality standards, formatting, license requirements, etc.).
- Only OSI-approved licenses
- Projects must be actively maintained (commits in last 6 months)
- High-quality, well-documented, real adoption
This list itself is licensed under CC0 1.0 Universal. Feel free to use it for any purpose.
Made with ❤️ for the open-source AI community. Star the repo if you find it useful - it helps more people discover the best open tools!