Skip to content

Latest commit

 

History

History
31 lines (26 loc) · 1.22 KB

README.md

File metadata and controls

31 lines (26 loc) · 1.22 KB

LLaSA WebUI

A simple web interface for LLaSA using ExLlamaV2 with an OpenAI compatible FastAPI server.

Installation

Clone the repo:

git clone https://github.com/zuellni/llasa-webui
cd llasa-webui

Create a conda/mamba/python env:

conda create -n llasa-webui python=3.12
conda activate llasa-webui

Install dependencies, ignore any xcodec2 errors:

pip install -r requirements.txt
pip install xcodec2 --no-deps

If you want to use torch+cu126, keep in mind that you'll need to compile exllamav2 and (optionally) flash-attn, and for python=3.13 you may need to compile sentencepiece.

Usage

python server.py --model <path or repo id>

You can use the HF models or EXL2 quants from here. Add --cache q4 --dtype bf16 for less VRAM usage.

Preview

Preview