Skip to content

theroyallab/YALS

Repository files navigation

YALS

Python 3.10, 3.11, and 3.12 License: AGPL v3 Discord Server

Support on Ko-Fi

Note

Need help? Join the Discord Server and get the Tabby role. Please be nice when asking questions.

Welcome to YALS, also known as Yet Another Llamacpp Server.

YALS is a friendly OAI compatible API server built with Deno, Hono, and Zod, designed to facilitate LLM text generation via the llama.cpp backend

Disclaimer

This project is in an alpha state. There may be bugs, possibly even ones that could cause thermonuclear war. Please note that commits happen frequently, and builds are distributed via CI.

YALS is a hobby project made for a small amount of users. It is not meant to run on production servers. For that, please look at other solutions that support those workloads.

Why?

The AI space is full of backend projects that wrap llama.cpp, but I felt that something was missing. This led me to create my own backend, one which is extensible, speedy, and as elegant as TabbyAPI, but specifically for llama.cpp and GGUF.

What about TabbyAPI?

Here are the reasons why I decided to create a separate project instead of integrating llamacpp support into TabbyAPI:

  1. Separation of concerns: I want TabbyAPI to stay focused on ExLlama, not become a monolithic backend.
  2. Distribution patterns: Unlike TabbyAPI, llama.cpp backends are often distributed as binaries. Deno’s compile command is vastly superior to PyInstaller, making binary distribution easier.
  3. Dependency hell: Python’s dependency system is a mess. Adding another layer of abstractions would confuse users further.
  4. New technologies: Since C++ (via C bindings) is universally compatible via an FFI interface, I wanted to try something new instead of struggling with Python. The main reason for using Deno is because it augments an easy to learn language (TypeScript) with inbuilt tooling and a robust FFI system.

Getting Started

To get started, download the latest zip from releases that corresponds to your setup.

The currently supported builds via CI are:

  • macOS: Metal
  • Windows/Linux: CPU
  • Windows/Linux: CUDA (built for Turing architectures and newer)

Note

If your specific setup is not available via CI, you can build locally via the building guide, or request a certain architecture in issues.

Then follow these steps:

  1. Extract the zip file
  2. Copy config_sample.yml to a file called config.yml
  3. Edit config.yml to configure model loading, networking, and other parameters.
    1. All options are commented: if you're unsure about an option, it's best to leave it unchanged.
    2. You can also use CLI arguments, similar to TabbyAPI (ex. --flash-attention true).
  4. Download a .gguf model into the models directory (or whatever you set your directory to)
    1. If the model is split into multiple parts (00001-of-0000x.gguf), set model_name in config.yml to the first part (ending in 00001). Other parts will load automatically.
  5. Start YALS:
    1. Windows: Double click YALS.exe or run .\YALS.exe from the terminal (recommended)
    2. macOS/Linux: Open a terminal and run ./YALS
  6. Navigate to http://<your URL>/docs (ex. http://localhost:5000/docs) to view the YALS Scalar API documentation.

Features

  • OpenAI compatible API
  • Loading/unloading models
  • Flexible Jinja2 template engine for chat completions that conforms to HuggingFace
  • String banning
  • Concurrent inference with Hono + async TypeScript
  • Robust validation with Zod

More features will be added as the project matures. If something is missing here, PR it in!

Supported Model Types

Since YALS uses llama.cpp for inference, the only supported model format is GGUF.

If you want to use other model formats such as Exl2, try tabbyAPI

Contributing

Use the template when creating issues or pull requests, otherwise the developers may not look at your post.

If you have issues with the project:

  • Describe the issue in detail
  • If you have a feature request, please indicate it as such.

If you have a Pull Request:

  • Describe the pull request in detail, what, and why you are changing something

Developers and Permissions

Creators/Developers:

Acknowledgements

YALS would not exist without the work of other contributors and FOSS projects: