Skip to content

madebygps/bitnet-fastapi-appservice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BitNet FastAPI Chat App

A web-based chat application that leverages BitNet b1.58-2B-4T for inference, deployed on Azure App Service with a sidecar container architecture.

Overview

This application provides a simple web interface to interact with BitNet, a 1-bit large language model designed for efficient inference. The app uses:

  • FastAPI: High-performance web framework for building APIs
  • BitNet b1.58-2B-4T: Official 2B parameter 1-bit LLM model
  • Azure App Service: PaaS hosting with sidecar container support
  • Bicep & Azure Developer CLI (azd): Infrastructure as Code and deployment pipeline

Architecture

The application uses a sidecar container pattern on Azure App Service:

  • Main Container: Runs the FastAPI application
  • BitNet Sidecar: Runs the BitNet model inference service
  • Communication: Main app connects to the sidecar via localhost

Prerequisites

Local Development

To run the application with BitNet inference locally, you'll need to follow the setup instructions from the BitNet repository and make sure it is running on the port your api is working with.

Then you can run this API.

  1. clone and run the FastAPI application:

    git clone https://github.com/yourusername/bitnet-fastapi-chat.git
    cd bitnet-fastapi-chat
    
    # Set up virtual environment
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
    # Install dependencies
    pip install -r requirements.txt
    
    # Run the app
    ENDPOINT=http://localhost:11434/v1 uvicorn app:app --reload

Deployment to Azure

  1. Log in to Azure:

    az login
  2. Initialize Azure Developer CLI:

    azd init
  3. Deploy the application:

    azd up

Application Configuration

Key application settings used by the app:

  • ENDPOINT: URL for BitNet inference API (http://localhost:11434/v1)
  • MODEL: Name of BitNet model (bitnet-b1.58-2b-4t-gguf)
  • SIDECAR_PORT: Port for BitNet sidecar (11434)
  • WEBSITES_ENABLE_APP_SERVICE_STORAGE: Enables persistent storage (set to 'true')
  • WEBSITE_ENABLE_SIDECAR: Enables sidecar containers (set to 'true')

BitNet Model Information

This application uses the BitNet b1.58-2B-4T model, which is a 1-bit large language model with 2.4B parameters. Key characteristics:

  • Model size: 1.10 GiB (3.91 Bytes Per Weight)
  • Fast CPU inference (optimized with bitnet.cpp)
  • Energy-efficient operation (up to 82% reduction compared to full-precision models)
  • Suitable for edge and resource-constrained environments

License

MIT License

Notes

This uses the P1V3 SKU for app service.

See the pricing calculator for details.

About

Bitnet inference framework on App Service

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published