A Linux CLI tool to manage the installation and launching of llama.cpp with configurable presets.
- llamarunner
- Automated Installation: Download and build
llama.cppautomatically with system-specific optimizations (CUDA support detection). - Preset Management: Create, list, and manage multiple model configurations with custom settings.
- Flexible Configuration: Define flags like context size, threads, quantization, prompt files, and server parameters (host/port).
- Quick Execution: Run models directly by preset name or execute specific
llama.cppbinaries. - Settings Management: Configure default paths, force CPU builds, and manage global settings.
- Self-Updating: Update llamarunner itself to the latest version from GitHub releases.
You can install llamarunner using one of the following methods:
This method automatically installs llamarunner with default settings:
curl -fsSL https://raw.githubusercontent.com/GGrassia/llamarunner/main/install.sh | shIf you prefer to customize installation options or want to be prompted for each step:
FORCE_INTERACTIVE=1 bash <(curl -fsSL https://raw.githubusercontent.com/GGrassia/llamarunner/main/install.sh)Both methods will download and install the appropriate binary for your system to ~/.local/bin/.
Alternatively, you can build from source:
git clone https://github.com/GGrassia/llamarunner.git
cd llamarunner
go build -o llamarunnerhelp: Show this help message.install: Downloads and builds llama.cpp with optimizations. Optionally build immediately with-bor--build.build [directory]: Builds llama.cpp in the specified directory (or default from settings) with CUDA detection.init: Initialize a new preset configuration interactively.list: List all available presets.run <preset-name>: Load and run a model using the specified preset. Can also directly executellama.cppbinaries if the preset name matches a binary path.set <target>: Manage configuration settings.set d: Set default settings (paths, host, port, etc.).set e: Edit the settings file (shows current settings, manual edit required).
update [options]: Updates llamarunner to the latest version from GitHub.--check: Check for updates without installing.--force: Force update even if already on the latest version.
Presets are stored as .cfg files in ~/.llama-presets/. Each preset file contains command-line arguments for llama-server.
Example my-model.cfg:
model=/path/to/model.gguf
threads=8
n_predict=200
ctx_size=2048
When run, llamarunner automatically enhances this with:
llama-server --host localhost --port 8080 --model /path/to/model.gguf -threads 8 -n_predict 200 -ctx_size 2048
Global settings are stored in ~/.llama-presets/settings.toml and include:
llama_cpp_path: Default directory for llama.cpp installation.model_path: Default directory for model files.config_path: Directory for preset configurations.host: Default server host (default: "localhost").port: Default server port (default: "8080").force_cpu: Force CPU builds even if CUDA is available (default: false).version: Current llamarunner version.
- Automated llama.cpp Installation: Downloads and compiles
llama.cppfrom source with optimizations. - CUDA Detection: Automatically detects CUDA availability and prompts for CPU-only build if CUDA is not found. Can force CPU builds via settings.
- Preset Creation & Management: Interactive
initcommand to create presets,listcommand to view available presets. - Model Execution:
runcommand loads presets and executesllama-serverwith all specified parameters (model path, threads, context size, predictions, host, port). - Settings Persistence: Saves and loads global settings (paths, build preferences) in
~/.llama-presets/settings.toml. - Binary Management: Builds
llama-cli,llama-gguf-split, andllama-serverbinaries. Keeps them in thebuild/bindirectory within the llama.cpp installation. - Self-Updating:
updatecommand checks GitHub for new releases and re-runs the installation script to update llamarunner itself. - Command-Line Interface: Supports
-h/--helpfor individual commands, argument parsing, and direct execution of preset names.
- Preset Editing: The
set ecommand currently displays current settings but does not open an editor for inline editing. Manual file editing is required. - Advanced Server Features: While
llama-serveris executed, advanced server configurations (e.g., different API endpoints beyond basic host/port) would require manual preset editing or direct binary execution. - Model Management: No built-in model downloading or management features beyond specifying paths in presets. Users must handle model file acquisition and placement.
- Error Handling for Missing Binaries: If
llama-server(or other binaries) fail to build, the error message is generic. More specific feedback on build failures could be added. - Cross-Platform Testing: While designed for Linux, broader platform compatibility (beyond the provided Linux binary names) would require additional testing and potentially conditional logic.
- Configuration Validation: Limited validation of preset configurations beyond file existence. Invalid parameter combinations might lead to runtime errors from
llama.cppitself.
-
Install llamarunner (choose one method):
Non-Interactive Mode (Recommended):
curl -fsSL https://raw.githubusercontent.com/GGrassia/llamarunner/main/install.sh | shInteractive Mode (for custom options):
FORCE_INTERACTIVE=1 bash <(curl -fsSL https://raw.githubusercontent.com/GGrassia/llamarunner/main/install.sh) -
Install llama.cpp (if not already present):
llamarunner install
Follow the prompts to set the installation directory and optionally build immediately.
-
Create a Preset:
llamarunner init
Enter a name, model path, and desired parameters (threads, context size, etc.).
-
Run Your Model:
llamarunner run <your-preset-name>
-
List Available Presets:
llamarunner list
-
Update llamarunner:
llamarunner update
Default paths and settings are managed in ~/.llama-presets/settings.toml. You can edit this file manually or use llamarunner set d to reset defaults.
The tool automatically detects CUDA availability during the build process. If CUDA is not found, it will prompt you to build with CPU support only and optionally update your settings to force CPU builds in the future.
model=/path/to/model.gguf
threads=8
n_predict=200
ctx_size=2048
Note: Host and port are automatically added by llamarunner when running the preset. To change these values, modify your global settings with llamarunner set d.
For CPU-only systems:
model=/path/to/model.gguf
threads= -1
ctx_size=2048
For GPU-accelerated systems (with CUDA):
model=/path/to/model.gguf
ngl=35 # Number of layers to offload to GPU
threads=4 # Fewer threads when using GPU
ctx_size=4096
CUDA not detected during build:
- If CUDA is installed but not detected, ensure you have the NVIDIA drivers and CUDA toolkit properly installed.
- You can force a CPU-only build by running
llamarunner set dand settingforce_cpu = true.
Model loading fails:
- Verify the model path in your preset is correct.
- Ensure the model file exists and has read permissions.
- Check that the model format is compatible with llama.cpp.
Port already in use:
- Change the port in your preset or settings to a different value.
- Kill any process using the desired port (e.g.,
sudo fuser -k 8080/tcp).
If you encounter issues not covered here:
- Check the llama.cpp repository for model-specific issues.
- Open an issue in the llamarunner GitHub repository.
- Include your system information, llamarunner version (
llamarunner set d), and preset configuration when reporting issues.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
To uninstall llamarunner:
- Remove the binary from your system:
rm ~/.local/bin/llamarunner - Optionally, remove all configurations and presets:
rm -rf ~/.llama-presets
- I'd like to make a preset repo with different model families and the relative optimal parameters, to further simplify the process of running or testing a new model.
- ik_llama implementation: while it technically works if you download ik_llama, build it with llama-server, set the default folder to the ik_llama one in the config.toml and then try to run models, it would be nice to implement a command to pull and build ik_llama just as normal llama.cpp does.