BitNet FastAPI Chat App

A web-based chat application that leverages BitNet b1.58-2B-4T for inference, deployed on Azure App Service with a sidecar container architecture.

Overview

This application provides a simple web interface to interact with BitNet, a 1-bit large language model designed for efficient inference. The app uses:

FastAPI: High-performance web framework for building APIs
BitNet b1.58-2B-4T: Official 2B parameter 1-bit LLM model
Azure App Service: PaaS hosting with sidecar container support
Bicep & Azure Developer CLI (azd): Infrastructure as Code and deployment pipeline

Architecture

The application uses a sidecar container pattern on Azure App Service:

Main Container: Runs the FastAPI application
BitNet Sidecar: Runs the BitNet model inference service
Communication: Main app connects to the sidecar via localhost

Prerequisites

Local Development

To run the application with BitNet inference locally, you'll need to follow the setup instructions from the BitNet repository and make sure it is running on the port your api is working with.

Then you can run this API.

clone and run the FastAPI application:

git clone https://github.com/yourusername/bitnet-fastapi-chat.git
cd bitnet-fastapi-chat

# Set up virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the app
ENDPOINT=http://localhost:11434/v1 uvicorn app:app --reload

Deployment to Azure

Log in to Azure:
```
az login
```
Initialize Azure Developer CLI:
```
azd init
```
Deploy the application:
```
azd up
```

Application Configuration

Key application settings used by the app:

ENDPOINT: URL for BitNet inference API (http://localhost:11434/v1)
MODEL: Name of BitNet model (bitnet-b1.58-2b-4t-gguf)
SIDECAR_PORT: Port for BitNet sidecar (11434)
WEBSITES_ENABLE_APP_SERVICE_STORAGE: Enables persistent storage (set to 'true')
WEBSITE_ENABLE_SIDECAR: Enables sidecar containers (set to 'true')

BitNet Model Information

This application uses the BitNet b1.58-2B-4T model, which is a 1-bit large language model with 2.4B parameters. Key characteristics:

Model size: 1.10 GiB (3.91 Bytes Per Weight)
Fast CPU inference (optimized with bitnet.cpp)
Energy-efficient operation (up to 82% reduction compared to full-precision models)
Suitable for edge and resource-constrained environments

License

MIT License

Notes

This uses the P1V3 SKU for app service.

See the pricing calculator for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
infra		infra
src		src
.gitignore		.gitignore
.gitignore copy		.gitignore copy
LICENSE		LICENSE
README.md		README.md
azure.yaml		azure.yaml
diagram.png		diagram.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BitNet FastAPI Chat App

Overview

Architecture

Prerequisites

Local Development

Deployment to Azure

Application Configuration

BitNet Model Information

License

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

madebygps/bitnet-fastapi-appservice

Folders and files

Latest commit

History

Repository files navigation

BitNet FastAPI Chat App

Overview

Architecture

Prerequisites

Local Development

Deployment to Azure

Application Configuration

BitNet Model Information

License

Notes

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages