Skip to content

Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.

License

Notifications You must be signed in to change notification settings

NexaAI/nexa-sdk

Repository files navigation

Nexa AI Banner

简体中文 | English

🤝 Supported chipmakers

NexaSDK for Mobile - #1 Product of the Day NexaAI/nexa-sdk - #1 Repository of the Day

Documentation Vote for Next Models X account Join us on Discord Join us on Slack

NexaSDK

NexaSDK lets you build the smartest and fastest on-device AI with minimum energy. It is a highly performant local inference framework that runs the latest multimodal AI models locally on NPU, GPU, and CPU - across Android, Windows, Linux, macOS, and iOS devices with a few lines of code.

NexaSDK supports latest models weeks or months before anyone else — Qwen3-VL, DeepSeek-OCR, Gemma3n (Vision), and more.

Star this repo to keep up with exciting updates and new releases about latest on-device AI capabilities.

🏆 Recognized Milestones

🚀 Quick Start

Platform Links
🖥️ CLI Quick StartDocs
🐍 Python Quick StartDocs
🤖 Android Quick StartDocs
🐳 Linux Docker Quick StartDocs
🍎 iOS Quick StartDocs

🖥️ CLI

Download:

Windows macOS Linux
arm64 (Qualcomm NPU) arm64 (Apple Silicon) arm64
x64 (Intel/AMD NPU) x64 x64

Run your first model:

# Chat with Qwen3
nexa infer ggml-org/Qwen3-1.7B-GGUF

# Multimodal: drag images into the CLI
nexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF

# NPU (Windows arm64 with Snapdragon X Elite)
nexa infer NexaAI/OmniNeural-4B
  • Models: LLM, Multimodal, ASR, OCR, Rerank, Object Detection, Image Generation, Embedding
  • Formats: GGUF, MLX, NEXA
  • NPU Models: Model Hub
  • 📖 CLI Reference Docs

🐍 Python SDK

pip install nexaai
from nexaai import LLM, GenerationConfig, ModelConfig, LlmChatMessage

llm = LLM.from_(model="NexaAI/Qwen3-0.6B-GGUF", config=ModelConfig())

conversation = [
    LlmChatMessage(role="user", content="Hello, tell me a joke")
]
prompt = llm.apply_chat_template(conversation)
for token in llm.generate_stream(prompt, GenerationConfig(max_tokens=100)):
    print(token, end="", flush=True)
  • Models: LLM, Multimodal, ASR, OCR, Rerank, Object Detection, Image Generation, Embedding
  • Formats: GGUF, MLX, NEXA
  • NPU Models: Model Hub
  • 📖 Python SDK Docs

🤖 Android SDK

Add to your app/AndroidManifest.xml

<application android:extractNativeLibs="true">

Add to your build.gradle.kts:

dependencies {
    implementation("ai.nexa:core:0.0.15")
}
// Initialize SDK
NexaSdk.getInstance().init(this)

// Load and run model
VlmWrapper.builder()
    .vlmCreateInput(VlmCreateInput(
        model_name = "omni-neural",
        model_path = "/data/data/your.app/files/models/OmniNeural-4B/files-1-1.nexa",
        plugin_id = "npu",
        config = ModelConfig()
    ))
    .build()
    .onSuccess { vlm ->
        vlm.generateStreamFlow("Hello!", GenerationConfig()).collect { print(it) }
    }
  • Requirements: Android minSdk 27, Qualcomm Snapdragon 8 Gen 4 Chip
  • Models: LLM, Multimodal, ASR, OCR, Rerank, Embedding
  • NPU Models: Supported Models
  • 📖 Android SDK Docs

🐳 Linux Docker

docker pull nexa4ai/nexasdk:latest

export NEXA_TOKEN="your_token_here"
docker run --rm -it --privileged \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest infer NexaAI/Granite-4.0-h-350M-NPU

🍎 iOS SDK

Download NexaSdk.xcframework and add to your Xcode project.

import NexaSdk

// Example: Speech Recognition
let asr = try Asr(plugin: .ane)
try await asr.load(from: modelURL)

let result = try await asr.transcribe(options: .init(audioPath: "audio.wav"))
print(result.asrResult.transcript)

⚙️ Features & Comparisons

Features NexaSDK Ollama llama.cpp LM Studio
NPU support ✅ NPU-first
Android/iOS SDK support ✅ NPU/GPU/CPU support ⚠️ ⚠️
Linux support (Docker image)
Day-0 model support in GGUF, MLX, NEXA ⚠️
Full multimodality support ✅ Image, Audio, Text, Embedding, Rerank, ASR, TTS ⚠️ ⚠️ ⚠️
Cross-platform support ✅ Desktop, Mobile (Android, iOS), Automotive, IoT (Linux) ⚠️ ⚠️ ⚠️
One line of code to run ⚠️
OpenAI-compatible API + Function calling

Legend: ✅ Supported   |   ⚠️ Partial or limited support   |   ❌ No

🎯 You Decide What Model We Support Next

Nexa Wishlist — Request and vote for the models you want to run on-device.

Drop a Hugging Face repo ID, pick your preferred backend (GGUF, MLX, or Nexa format for Qualcomm + Apple NPUs), and watch the community's top requests go live in NexaSDK.

👉 Vote now at sdk.nexa.ai/wishlist

💰 Join Builder Bounty Program

Earn up to 1,500 USD for building with NexaSDK.

Developer Bounty

Learn more in our Participant Details).

🙏 Acknowledgements

We would like to thank the following projects:

📄 License

NexaSDK uses a dual licensing model:

CPU/GPU Components

Licensed under Apache License 2.0.

NPU Components

🤝 Contact & Community Support

Business Inquiries

For model launching partner, business inquiries, or any other questions, please schedule a call with us here.

Community & Support

Want more model support, backend support, device support or other features? We'd love to hear from you!

Feel free to submit an issue on our GitHub repository with your requests, suggestions, or feedback. Your input helps us prioritize what to build next.

Join our community:

About

Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 44