NexaSDK

🤝 Supported chipmakers

NexaSDK

NexaSDK lets you build the smartest and fastest on-device AI with minimum energy. It is a highly performant local inference framework that runs the latest multimodal AI models locally on NPU, GPU, and CPU - across Android, Windows, Linux, macOS, and iOS devices with a few lines of code.

NexaSDK supports latest models weeks or months before anyone else — Qwen3-VL, DeepSeek-OCR, Gemma3n (Vision), and more.

⭐ Star this repo to keep up with exciting updates and new releases about latest on-device AI capabilities.

🏆 Recognized Milestones

Qualcomm featured us 3 times in official blogs.
Qwen featured us for Day-0 Qwen3-VL support on NPU, GPU, and CPU. We were 3 weeks ahead of Ollama and llama.cpp on GGUF support, and no one else supports it on NPU to date.
IBM featured our NexaML inference engine alongside vLLM, llama.cpp, and MLX in official IBM blog and also for Day-0 Granite 4.0 support.
Google featured us for EmbeddingGemma Day-0 NPU support.
AMD featured us for enabling SDXL-turbo image generation on AMD NPU.
NVIDIA featured Hyperlink, a viral local AI app powered by NexaSDK, in their official blog.
Microsoft presented us on stage at Microsoft Ignite 2025 as official partner.
Intel featured us for Intel NPU support in NexaSDK.

🚀 Quick Start

Platform	Links
🖥️ CLI	Quick Start ｜ Docs
🐍 Python	Quick Start ｜ Docs
🤖 Android	Quick Start ｜ Docs
🐳 Linux Docker	Quick Start ｜ Docs
🍎 iOS	Quick Start ｜ Docs

🖥️ CLI

Download:

Windows	macOS	Linux
arm64 (Qualcomm NPU)	arm64 (Apple Silicon)	arm64
x64 (Intel/AMD NPU)	x64	x64

Run your first model:

# Chat with Qwen3
nexa infer ggml-org/Qwen3-1.7B-GGUF

# Multimodal: drag images into the CLI
nexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF

# NPU (Windows arm64 with Snapdragon X Elite)
nexa infer NexaAI/OmniNeural-4B

Models: LLM, Multimodal, ASR, OCR, Rerank, Object Detection, Image Generation, Embedding
Formats: GGUF, MLX, NEXA
NPU Models: Model Hub
📖 CLI Reference Docs

🐍 Python SDK

pip install nexaai

from nexaai import LLM, GenerationConfig, ModelConfig, LlmChatMessage

llm = LLM.from_(model="NexaAI/Qwen3-0.6B-GGUF", config=ModelConfig())

conversation = [
    LlmChatMessage(role="user", content="Hello, tell me a joke")
]
prompt = llm.apply_chat_template(conversation)
for token in llm.generate_stream(prompt, GenerationConfig(max_tokens=100)):
    print(token, end="", flush=True)

Models: LLM, Multimodal, ASR, OCR, Rerank, Object Detection, Image Generation, Embedding
Formats: GGUF, MLX, NEXA
NPU Models: Model Hub
📖 Python SDK Docs

🤖 Android SDK

Add to your app/AndroidManifest.xml

<application android:extractNativeLibs="true">

Add to your build.gradle.kts:

dependencies {
    implementation("ai.nexa:core:0.0.15")
}

// Initialize SDK
NexaSdk.getInstance().init(this)

// Load and run model
VlmWrapper.builder()
    .vlmCreateInput(VlmCreateInput(
        model_name = "omni-neural",
        model_path = "/data/data/your.app/files/models/OmniNeural-4B/files-1-1.nexa",
        plugin_id = "npu",
        config = ModelConfig()
    ))
    .build()
    .onSuccess { vlm ->
        vlm.generateStreamFlow("Hello!", GenerationConfig()).collect { print(it) }
    }

Requirements: Android minSdk 27, Qualcomm Snapdragon 8 Gen 4 Chip
Models: LLM, Multimodal, ASR, OCR, Rerank, Embedding
NPU Models: Supported Models
📖 Android SDK Docs

🐳 Linux Docker

docker pull nexa4ai/nexasdk:latest

export NEXA_TOKEN="your_token_here"
docker run --rm -it --privileged \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest infer NexaAI/Granite-4.0-h-350M-NPU

Requirements: Qualcomm Dragonwing IQ9, ARM64 systems
Models: LLM, VLM, ASR, CV, Rerank, Embedding
NPU Models: Supported Models
📖 Linux Docker Docs

🍎 iOS SDK

Download NexaSdk.xcframework and add to your Xcode project.

import NexaSdk

// Example: Speech Recognition
let asr = try Asr(plugin: .ane)
try await asr.load(from: modelURL)

let result = try await asr.transcribe(options: .init(audioPath: "audio.wav"))
print(result.asrResult.transcript)

Requirements: iOS 17.0+ / macOS 15.0+, Swift 5.9+
Models: LLM, ASR, OCR, Rerank, Embedding
ANE Models: Apple Neural Engine Models
📖 iOS SDK Docs

⚙️ Features & Comparisons

Features	NexaSDK	Ollama	llama.cpp	LM Studio
NPU support	✅ NPU-first	❌	❌	❌
Android/iOS SDK support	✅ NPU/GPU/CPU support	⚠️	⚠️	❌
Linux support (Docker image)	✅	✅	✅	❌
Day-0 model support in GGUF, MLX, NEXA	✅	❌	⚠️	❌
Full multimodality support	✅ Image, Audio, Text, Embedding, Rerank, ASR, TTS	⚠️	⚠️	⚠️
Cross-platform support	✅ Desktop, Mobile (Android, iOS), Automotive, IoT (Linux)	⚠️	⚠️	⚠️
One line of code to run	✅	✅	⚠️	✅
OpenAI-compatible API + Function calling	✅	✅	✅	✅

Legend: ✅ Supported | ⚠️ Partial or limited support | ❌ No

🎯 You Decide What Model We Support Next

Nexa Wishlist — Request and vote for the models you want to run on-device.

Drop a Hugging Face repo ID, pick your preferred backend (GGUF, MLX, or Nexa format for Qualcomm + Apple NPUs), and watch the community's top requests go live in NexaSDK.

👉 Vote now at sdk.nexa.ai/wishlist

💰 Join Builder Bounty Program

Earn up to 1,500 USD for building with NexaSDK.

Learn more in our Participant Details).

🙏 Acknowledgements

We would like to thank the following projects:

📄 License

NexaSDK uses a dual licensing model:

CPU/GPU Components

Licensed under Apache License 2.0.

NPU Components

Personal Use: Free license key available from Nexa AI Model Hub. Each key activates 1 device for NPU usage.
Commercial Use: Contact [email protected] for licensing.

🤝 Contact & Community Support

Business Inquiries

For model launching partner, business inquiries, or any other questions, please schedule a call with us here.

Community & Support

Want more model support, backend support, device support or other features? We'd love to hear from you!

Feel free to submit an issue on our GitHub repository with your requests, suggestions, or feedback. Your input helps us prioritize what to build next.

Join our community:

Name		Name	Last commit message	Last commit date
Latest commit History 1,396 Commits
.github		.github
.vscode		.vscode
assets		assets
bindings		bindings
cookbook		cookbook
runner		runner
solutions		solutions
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
README_zh.md		README_zh.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NexaSDK

🏆 Recognized Milestones

🚀 Quick Start

🖥️ CLI

🐍 Python SDK

🤖 Android SDK

🐳 Linux Docker

🍎 iOS SDK

⚙️ Features & Comparisons

🎯 You Decide What Model We Support Next

💰 Join Builder Bounty Program

🙏 Acknowledgements

📄 License

CPU/GPU Components

NPU Components

🤝 Contact & Community Support

Business Inquiries

Community & Support

About

Uh oh!

Releases 48

Packages

Uh oh!

Contributors 44

Uh oh!

Languages

License

NexaAI/nexa-sdk

Folders and files

Latest commit

History

Repository files navigation

NexaSDK

🏆 Recognized Milestones

🚀 Quick Start

🖥️ CLI

🐍 Python SDK

🤖 Android SDK

🐳 Linux Docker

🍎 iOS SDK

⚙️ Features & Comparisons

🎯 You Decide What Model We Support Next

💰 Join Builder Bounty Program

🙏 Acknowledgements

📄 License

CPU/GPU Components

NPU Components

🤝 Contact & Community Support

Business Inquiries

Community & Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 48

Packages 0

Uh oh!

Contributors 44

Uh oh!

Languages

Packages