Local-first agentic coding tool powered by local LLMs.
Animus is a local-first agentic coding assistant that runs entirely on your machine. It uses local GGUF models via llama-cpp-python — no API keys, no data leaving your machine. Inspired by claw-code but designed from the ground up for local models, Animus adapts its behavior to the capability of the model you load: a 7-tier system scales planner complexity, grammar enforcement, tool availability, and turn budget to match what the model can reliably handle. Small models get tight GBNF grammar constraints and a decomposing planner; large models get full tool access and free-form generation. The result is a tool that works well across the full spectrum of local hardware.
# Clone the repository
git clone https://github.com/crussella0129/Animus.git
cd Animus
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install Animus and its dependencies
pip install -e ".[dev]"
# Install llama-cpp-python (choose one):
pip install llama-cpp-python # CPU only
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124 # CUDA 12.4Animus works with any GGUF model. A few recommended starting points:
# ~4GB, good for most machines (Small tier)
huggingface-cli download Qwen/Qwen2.5-Coder-7B-Instruct-GGUF \
qwen2.5-coder-7b-instruct-q4_k_m.gguf --local-dir ~/models
# ~8GB, better quality (Medium tier)
huggingface-cli download Qwen/Qwen2.5-Coder-14B-Instruct-GGUF \
qwen2.5-coder-14b-instruct-q4_k_m.gguf --local-dir ~/modelsanimus "Explain the structure of this project" --model ~/models/qwen2.5-coder-7b-instruct-q4_k_m.ggufanimus --model ~/models/qwen2.5-coder-7b-instruct-q4_k_m.ggufOnce in the REPL, type your prompt and press Enter. Use /help to see all commands.
animus [PROMPT] [OPTIONS]
Arguments:
PROMPT One-shot prompt. Omit to enter the interactive REPL.
Options:
-m, --model TEXT Path to a GGUF model file, or model name.
-p, --permission TEXT Permission mode: read-only, standard, full, prompt.
Default: standard.
-w, --workspace TEXT Workspace root directory. Default: current directory.
-c, --config TEXT Path to an additional config YAML file (local tier).
--help Show this message and exit.
Animus detects the tier automatically from the model's parameter count embedded in its metadata. Tier controls planner behavior, grammar enforcement, max turns, and which tools are available.
| Tier | Params | Planner | Grammar Mode | Max Turns | Tools Available | Example Models |
|---|---|---|---|---|---|---|
| Nano | < 4B | Yes (2 steps) | full | 6 | 4 | Qwen2.5-Coder-1.5B, Phi-3-mini |
| Small | 4 – 13B | Yes (3 steps) | first_turn | 15 | 6 | Qwen2.5-Coder-7B, Mistral-7B |
| Medium | 13 – 30B | No | off | 20 | 8 | Qwen2.5-Coder-14B, DeepSeek-Coder-V2-Lite |
| Large | 30 – 70B | No | off | 15 | 10 | Qwen2.5-Coder-32B, CodeLlama-34B |
| XL | 70 – 200B | No | off | 25 | 10 | Qwen2.5-Coder-72B, Llama-3.1-70B |
| Ultra | > 200B | No | off | 30 | 11 | Llama-3.1-405B, DeepSeek-V3 |
Planner — For Nano and Small tiers, tasks are decomposed into sub-steps before execution. Each step has its own scoped tool list and turn budget.
Grammar Mode — GBNF grammar constraints are applied to force structured JSON output from the model. full = every turn; first_turn = only the first generation; off = free-form.
All tools are confined to the workspace boundary. Attempting to access paths outside the workspace is blocked at the security layer.
| Tool | Permission | Min Tier | Description |
|---|---|---|---|
| read_file | READ | Nano | Read a file with optional line offset and limit. Returns numbered lines. |
| write_file | WRITE | Nano | Write (overwrite) a file. Creates parent directories as needed. |
| edit_file | WRITE | Small | Replace an exact string in a file. Fails on ambiguous matches. |
| list_dir | READ | Nano | List directory contents with type indicators (trailing / for dirs). |
| glob_search | READ | Nano | Find files matching a glob pattern. Returns up to 100 results. |
| grep_search | READ | Small | Search file contents with a regex. Returns up to 50 file:line: matches. |
| bash | EXECUTE | Medium | Run a shell command in the workspace. Injection patterns are blocked. |
| git | WRITE | Medium | Run git subcommands (status, diff, add, commit, etc.). Network ops blocked. |
Slash commands are intercepted in the REPL before input reaches the model.
| Command | Description |
|---|---|
/help |
List all available slash commands. |
/status |
Show session stats: message count, token estimate, tier, context, mode. |
/compact |
Manually compact session history to free context window space. |
/clear |
Clear session history and start a fresh conversation. |
/cost |
Show token usage for the current session (input, output, total). |
/model [name] |
Show current model info, or request a model switch. |
/permissions [mode] |
Show or change the permission mode (read-only, standard, full, prompt). |
/session |
Show the current session ID and creation timestamp. |
/diff |
Run git diff in the workspace root and display the result. |
/config [key] [value] |
Show or set a config value (e.g. /config model.context_length 32768). |
/plan |
Show whether the planner is active for the current model tier. |
/tier |
Show the detected model tier and parameter count. |
| Mode | Allows | Use When |
|---|---|---|
| read-only | READ tools only | You only want the model to read and analyze code. |
| standard | READ + WRITE tools | Normal coding sessions. Default mode. |
| full | READ + WRITE + EXECUTE (bash, git) | You trust the model to run shell commands. |
| prompt | READ always; prompts before WRITE/EXEC | Reserved for future interactive approval workflow. |
Set the mode at launch:
animus --model ~/models/model.gguf --permission fullOr change it mid-session:
> /permissions read-only
Permission mode set to: read-only
Animus uses a three-tier YAML config system. Each tier overrides the previous via deep merge:
| Tier | Location | Purpose |
|---|---|---|
| User | ~/.animus/config.yaml |
Your personal defaults across all projects. |
| Project | .animus/config.yaml |
Project-specific settings. Commit this. |
| Local | .animus/config.local.yaml |
Machine-local overrides. Git-ignored. |
# ~/.animus/config.yaml
model:
provider: native # Only "native" (llama-cpp-python) is supported today.
model_path: "" # Absolute path to your .gguf file.
temperature: 0.7 # Sampling temperature (0.0 – 2.0).
max_tokens: 2048 # Maximum tokens per generation.
context_length: 4096 # Model context window size (tokens).
gpu_layers: -1 # GPU layers to offload. -1 = all, 0 = CPU only.
size_tier: auto # Override tier detection: auto/nano/small/medium/large/xl/ultra.
agent:
permission_mode: standard # read-only | standard | full | prompt
max_turns: 20 # Hard cap on agentic loop iterations per turn.
system_prompt: "You are Animus, a local AI coding assistant with tool use."
session:
auto_save: true # Save session to .animus/sessions/ on exit.
auto_compact: true # Automatically compact when nearing context limit.
compact_threshold: 0.7 # Compact when session fills this fraction of context.
preserve_recent: 4 # Messages kept verbatim after compaction.CLI --config flag > .animus/config.local.yaml > .animus/config.yaml > ~/.animus/config.yaml
You can also point to any YAML file as the local (highest-priority) config tier:
animus --config /path/to/overrides.yaml --model ~/models/model.ggufSessions are automatically saved to .animus/sessions/ when you exit the REPL (if auto_save: true). Each session is a JSON file named session-<uuid>.json and contains the full message history plus token usage counters.
Auto-compaction activates when the conversation grows past compact_threshold of the model's context window. The compactor builds a structured plain-text summary of older messages (tools used, files referenced, user requests, assistant actions) and replaces them with a single system message, preserving the preserve_recent most-recent exchanges verbatim. The session ID and creation timestamp are retained.
Trigger compaction manually with /compact, or clear the session entirely with /clear.
ConversationRuntime ReAct loop: prompt → generate → tool calls → results → repeat
├─ Provider (protocol) Abstracts LLM backends. NativeProvider wraps llama-cpp-python.
├─ ToolRunner Dispatches tool calls to registered ToolSpec handlers.
├─ PermissionPolicy Authorizes tool calls based on PermissionLevel vs. mode.
├─ Workspace Enforces file boundary: all paths resolved and checked.
├─ Session Append-only conversation history with JSON persistence.
└─ Compactor Summarizes old messages to reclaim context window space.
Tier system Auto-detects model size → scales planner, grammar, turn budget.
Planner (Nano/Small) Decomposes tasks into scoped sub-steps before execution.
GBNF grammar Enforces structured JSON tool-call output for small models.
Deny lists Hardcoded blocks: injection patterns, destructive shell commands.
The core loop lives in src/core/runtime.py. The Provider protocol (src/providers/base.py) makes it straightforward to add new LLM backends (HTTP APIs, cloud providers, etc.) without touching the runtime. Tool handlers are pure functions — (args: dict, workspace: Workspace) -> ToolResult — registered declaratively with a JSON schema, permission level, and minimum tier. Adding a new tool does not require modifying any other module.
Security is defense-in-depth: workspace boundary checks in every tool handler, injection pattern rejection in the shell tool, a deny-list for destructive commands, and permission gating at the runtime level.
git clone https://github.com/crussella0129/Animus.git
cd Animus
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"pytestAll tests mock the LLM provider — no model download required.
-
Write a handler function in the appropriate file under
src/tools/:def handle_my_tool(args: dict[str, Any], workspace: Workspace) -> ToolResult: value = args.get("my_param", "") # ... do work ... return ToolResult(output="result")
-
Register it in
src/main.pyinside theToolRunnersetup block:tool_runner.register(ToolSpec( name="my_tool", description="What my tool does.", input_schema={ "type": "object", "properties": { "my_param": {"type": "string", "description": "..."}, }, "required": ["my_param"], }, permission=PermissionLevel.READ, min_tier=Tier.NANO, handler=lambda args: handle_my_tool(args, ws), ))
-
Add tests in
tests/tools/test_my_tool.py.
Implement the Provider protocol in src/providers/:
from src.providers.base import Provider, ProviderResponse, ModelCapabilities
class MyProvider(Provider):
def generate(self, messages, tools=None, grammar=None, stream=False) -> ProviderResponse:
...
def capabilities(self) -> ModelCapabilities:
...Then instantiate it in main.py in place of NativeProvider.
src/
main.py CLI entry point — wires all modules together
core/
config.py Three-tier YAML config (User < Project < Local)
runtime.py ConversationRuntime — the ReAct agentic loop
session.py Append-only conversation history + JSON persistence
compactor.py Session summarization to reclaim context space
tiers.py 7-tier model system + TierConfig constants
planner.py Task decomposer for Nano/Small tiers
providers/
base.py Provider protocol (abstract base class)
native.py NativeProvider wrapping llama-cpp-python
tools/
registry.py ToolRunner + ToolSpec declarative tool system
filesystem.py read_file, write_file, edit_file, list_dir
search.py glob_search, grep_search
shell.py bash — with injection blocking and timeout
git.py git — with network-op blocking
security/
workspace.py Workspace boundary enforcement
permissions.py PermissionLevel + PermissionPolicy
deny_lists.py Injection patterns + blocked command list
grammar/
gbnf.py GBNF grammar builder for structured tool-call output
cli/
repl.py Interactive REPL loop
commands.py Slash command registry and parser
render.py Rich console rendering helpers
MIT License. See LICENSE for details.