YALS

Note

Need help? Join the Discord Server and get the Tabby role. Please be nice when asking questions.

Welcome to YALS, also known as Yet Another Llamacpp Server.

YALS is a friendly OAI compatible API server built with Deno, Hono, and Zod, designed to facilitate LLM text generation via the llama.cpp backend

Disclaimer

This project is in an alpha state. There may be bugs, possibly even ones that could cause thermonuclear war. Please note that commits happen frequently, and builds are distributed via CI.

YALS is a hobby project made for a small amount of users. It is not meant to run on production servers. For that, please look at other solutions that support those workloads.

Why?

The AI space is full of backend projects that wrap llama.cpp, but I felt that something was missing. This led me to create my own backend, one which is extensible, speedy, and as elegant as TabbyAPI, but specifically for llama.cpp and GGUF.

What about TabbyAPI?

Here are the reasons why I decided to create a separate project instead of integrating llamacpp support into TabbyAPI:

Separation of concerns: I want TabbyAPI to stay focused on ExLlama, not become a monolithic backend.
Distribution patterns: Unlike TabbyAPI, llama.cpp backends are often distributed as binaries. Deno’s compile command is vastly superior to PyInstaller, making binary distribution easier.
Dependency hell: Python’s dependency system is a mess. Adding another layer of abstractions would confuse users further.
New technologies: Since C++ (via C bindings) is universally compatible via an FFI interface, I wanted to try something new instead of struggling with Python. The main reason for using Deno is because it augments an easy to learn language (TypeScript) with inbuilt tooling and a robust FFI system.

Getting Started

To get started, download the latest zip from releases that corresponds to your setup.

The currently supported builds via CI are:

macOS: Metal
Windows/Linux: CPU
Windows/Linux: CUDA (built for Turing architectures and newer)

Note

If your specific setup is not available via CI, you can build locally via the building guide, or request a certain architecture in issues.

Then follow these steps:

Extract the zip file
Copy config_sample.yml to a file called config.yml
Edit config.yml to configure model loading, networking, and other parameters.
1. All options are commented: if you're unsure about an option, it's best to leave it unchanged.
2. You can also use CLI arguments, similar to TabbyAPI (ex. --flash-attention true).
Download a .gguf model into the models directory (or whatever you set your directory to)
1. If the model is split into multiple parts (00001-of-0000x.gguf), set model_name in config.yml to the first part (ending in 00001). Other parts will load automatically.
Start YALS:
1. Windows: Double click YALS.exe or run .\YALS.exe from the terminal (recommended)
2. macOS/Linux: Open a terminal and run ./YALS
Navigate to http://<your URL>/docs (ex. http://localhost:5000/docs) to view the YALS Scalar API documentation.

Features

OpenAI compatible API
Loading/unloading models
Flexible Jinja2 template engine for chat completions that conforms to HuggingFace
String banning
Concurrent inference with Hono + async TypeScript
Robust validation with Zod

More features will be added as the project matures. If something is missing here, PR it in!

Supported Model Types

Since YALS uses llama.cpp for inference, the only supported model format is GGUF.

If you want to use other model formats such as Exl2, try tabbyAPI

Contributing

Use the template when creating issues or pull requests, otherwise the developers may not look at your post.

If you have issues with the project:

Describe the issue in detail
If you have a feature request, please indicate it as such.

If you have a Pull Request:

Describe the pull request in detail, what, and why you are changing something

Developers and Permissions

Creators/Developers:

kingbri - TypeScript, Deno, and some C++
CoffeeVampire - Main C++ developer

Acknowledgements

YALS would not exist without the work of other contributors and FOSS projects:

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.github		.github
api		api
assets		assets
bindings		bindings
common		common
lib		lib
models		models
templates		templates
types		types
.gitignore		.gitignore
BUILDING.md		BUILDING.md
LICENSE		LICENSE
README.md		README.md
config_sample.yml		config_sample.yml
deno.json		deno.json
deno.lock		deno.lock
generateGitSha.ts		generateGitSha.ts
main.ts		main.ts
minimal_test_setup.ts		minimal_test_setup.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YALS

Disclaimer

Why?

What about TabbyAPI?

Getting Started

Features

Supported Model Types

Contributing

Developers and Permissions

Acknowledgements

About

Releases 3

Sponsor this project

Packages

Contributors 4

Languages

License

theroyallab/YALS

Folders and files

Latest commit

History

Repository files navigation

YALS

Disclaimer

Why?

What about TabbyAPI?

Getting Started

Features

Supported Model Types

Contributing

Developers and Permissions

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases 3

Sponsor this project

Packages 0

Contributors 4

Languages

Packages