A Linux ops CLI where LLM-generated commands must pass deterministic policy and human approval before execution.
简体中文完整文档 · Full English manual · why substring matching is not safety · v4.1.0 release notes · share real usage feedback
LinuxAgent is not a free-form shell chatbot and not an autonomous remediator. It lets an LLM propose Linux operations, but execution stays behind deterministic policy checks, Human-in-the-Loop confirmation, SSH safety guards, output redaction, and a hash-chained audit log.
The core project is intentionally narrow: command planning, policy, HITL, audit, SSH guardrails, and a small set of practical providers. Extra providers, runbooks, and integrations are treated as extension points rather than the center of the product.
LLM command agents usually fail at the exact point operators care about: trust.
LinuxAgent's default stance is different:
| Principle | What LinuxAgent does |
|---|---|
| The model is not trusted | First-time LLM-generated commands require confirmation |
| Safety is policy, not substring matching | Commands are tokenized and evaluated by a capability-based policy engine |
| Production output may contain secrets | Tool output is guarded and redacted before LLM-facing analysis |
| SSH must not silently trust hosts | Remote execution uses known-host verification and shell-syntax guards |
| Every approval should be reviewable | HITL decisions are written to a 0o600 hash-chained audit log |
LinuxAgent v4.1 turns the command safety boundary into a measurable subsystem, not just a set of claims in the README:
| Layer | What changed |
|---|---|
| Red-team proof | 24 adversarial command-agent cases run in CI with make red-team |
| Shell structure | Pipelines, subshells, command substitution, redirects, and nested shell execution are analyzed before execution |
| LOLBin coverage | Network-to-shell pipelines, find -exec, xargs, awk system(), editor escapes, and interpreter inline execution are classified deterministically |
| Fuzzing and benchmark | Hypothesis parser fuzzing plus P50/P95/P99 policy latency reporting |
| Audit depth | Optional HTTP sink forwards hash-chained entries while local append remains the source of truth |
| Observability | Telemetry can export local JSONL, console JSON, OTLP HTTP JSON, or be disabled explicitly |
| Sandbox roadmap | Landlock design documents capability probes, fallback order, compatibility limits, and implementation slices |
| Agent integration | linuxagent mcp exposes read-only policy classify and audit verify tools over stdio MCP |
git clone https://github.com/Eilen6316/LinuxAgent.git
cd LinuxAgent
./scripts/bootstrap.sh
source .venv/bin/activateThen edit ./config.yaml and set a provider. The core paths are OpenAI,
DeepSeek, local Ollama/OpenAI-compatible models, and Anthropic:
api:
provider: deepseek
api_key: "replace-me"For local Ollama:
api:
provider: ollama
base_url: http://127.0.0.1:11434/v1
model: llama3.1
api_key: ""
token_parameter: max_tokensOther OpenAI-compatible relays and provider shortcuts are documented in the Provider Compatibility Matrix. They are useful, but they are not the project's core safety story.
Validate and start:
linuxagent check
linuxagentTry a read-only request:
check the Linux version
When a first LLM-generated command appears, choose Yes for one execution,
Yes, don't ask again for matching commands only in this conversation and the
same /resume thread, or No to refuse. Use !uname -a for direct
operator-authored command mode.
config.yaml must be owned by the current user and chmod 600; secrets are not loaded from .env.
you: find services listening on port 8080
parse_intent -> LLM proposes: ss -tlnp sport = :8080
safety_check -> CONFIRM (LLM_FIRST_RUN)
confirm -> operator approves in terminal
execute -> asyncio subprocess, no shell=True
analyze -> concise operator summary
audit.log -> hash-chained JSONL decision record
For ordinary conversation, LinuxAgent first asks an LLM-owned intent router for
DIRECT_ANSWER, COMMAND_PLAN, or CLARIFY. Direct answers do not create a
command plan and therefore do not show the confirmation panel. Operational
methods are not hard-coded in Python; successful command patterns are learned in
the local learner memory after sensitive values are redacted. Deterministic
safety policy data lives in YAML, while Python code only loads, validates, and
applies those policies.
Each CLI launch starts with an empty conversation context. Saved sessions are
available only when the operator asks for it with /resume; then enter the
shown number or use the interactive picker to resume that saved session. If the
selected session stopped at a HITL confirmation, LinuxAgent reloads the local
checkpoint and reopens the confirmation flow. Use /new to reset context inside
a running CLI session and /tools to see available slash/tool entry points.
Typing / opens the slash-command completion menu. Command confirmations use
an arrow-key menu with Yes, Yes, don't ask again, and No when conversation
permissions are allowed. Yes, don't ask again is scoped to the current
conversation thread and the same thread when resumed with /resume. New
conversations do not inherit it, and destructive or never_whitelist policy
matches still ask every time.
Input beginning with ! is direct command mode: LinuxAgent executes the
operator-authored command, streams stdout/stderr live, and records both
!<command> and the system result into the active conversation context. It does
not ask the LLM to explain or generate a reply for that turn.
| Capability | Why it matters |
|---|---|
| Capability-based policy engine | Produces SAFE / CONFIRM / BLOCK, risk scores, capabilities, and matched rules |
| YAML policy defaults | Command policy data is loaded from configs/policy.default.yaml, not Python rule tables |
Structured CommandPlan |
LLM output must validate as JSON before any policy or execution path |
| Structured file patches | Script/code/config edits use transactional FilePatchPlan apply, unified-diff validation, path policy, and HITL review |
| Read-only workspace tools | Planner can inspect allowed files with read_file, list_dir, and search_files before proposing patches |
| AI-owned intent routing | Conversation vs operation vs clarification is decided by prompts/intent_router.md, not Python keyword rules |
| Explicit resume control | New sessions do not inherit previous chats unless /resume is used; pending HITL checkpoints resume there too |
Direct ! command mode |
Runs operator-authored commands without an AI reply and adds command/output to current context |
| Sandbox metadata boundary | Commands carry a selected sandbox profile into audit and telemetry; default noop records metadata only |
| YAML runbooks | Common ops procedures are injected as planner guidance, not pre-LLM hard routes |
| Learner memory | Successful command patterns are persisted locally after secret redaction |
| LangGraph HITL | Confirmation uses interrupt() and checkpointing rather than inline input() |
| SSH cluster guard | Batch confirmation, remote shell metacharacter blocking, remote profile audit |
| Output protection | Command results are redacted and bounded before model-facing analysis |
| Hash-chained audit | linuxagent audit verify detects local audit-log tampering |
| Reproducible release | constraints.txt, wheel verification, and packaged config/prompt/runbook checks |
Requests such as "create a shell script", "update this Python file", or "edit
this config" do not bypass the safety model. LinuxAgent asks the planner for a
structured FilePatchPlan, then validates and previews the unified diff before
writing anything. The plan carries a structured request_intent field
(create, update, or unknown) instead of relying on Python keyword
matching.
The planner can first inspect the environment with read-only tools:
read_file(path, offset, limit)reads a bounded window from an allowed file.list_dir(path)lists an allowed directory.search_files(pattern, root)searches literal text under an allowed root; regex metacharacters are treated as ordinary text.get_system_info,search_logs, and safety-gatedexecute_commandprovide system context when needed.
Tool calls run through the tool sandbox runtime before output reaches the model:
each tool carries explicit permissions (read_files, write_files,
execute_commands, system_inspect, network_access, and hitl mode),
workspace/log roots are checked, per-tool timeouts and output limits are
applied, oversized output is marked as truncated, and tool errors are returned
as structured model-visible events while telemetry records allowed, denied,
timeout, or truncated.
The terminal shows observable tool activity such as LinuxAgent is reading ... / LinuxAgent is listing ..., then prints concise evidence from completed
workspace tool calls, such as the first line-numbered read_file snippets or
search_files matches. If the planner concludes that no file change is needed,
the final answer includes the cited evidence so the operator can see which file
lines or search results supported the judgment. Patch confirmation shows
per-file stats, compact + / - diff snippets, high-risk path warnings,
permission changes, large-diff pagination, and per-file acceptance for
multi-file patches. Full diffs are not shown twice; extra review prompts appear
only when hidden pages exist.
Command confirmation also shows planned sandbox context: requested profile, runner, enforcement state, cwd, allowed roots, network policy, and fallback reason when the configured runner cannot enforce isolation.
After approval, patch application runs as a transaction. LinuxAgent validates
target paths before reading file content, rejects symlink path components,
hardlinks, directories, device files, FIFOs, sockets, oversized targets, and
non-UTF-8 text, then writes through a temporary file and atomic replace. Existing
targets are backed up under a local .linuxagent-patch-* sandbox directory and
rolled back automatically if a later file or permission change fails. Audit
metadata records changed files, permission changes, backup path hashes, rollback
outcome, and the sandbox root.
By default, file patch reads and writes are limited to the current workspace and
/tmp through file_patch.allow_roots. Sensitive roots such as /etc and SSH
key directories are highlighted as high risk, and permission changes such as
0755 for generated scripts appear explicitly in the confirmation panel.
Automatic patch repair defaults to two rounds and can be tuned with
file_patch.max_repair_attempts (0 disables automatic patch repair).
Failed command-plan repair is separately capped by
command_plan.max_repair_attempts (0 disables failed-command replanning).
LinuxAgent local command execution now goes through a sandbox runner boundary.
The default remains sandbox.enabled: false with runner: noop, which preserves
compatibility while recording sandbox metadata only. runner: local applies
process lifecycle controls such as clean environment, closed stdin, timeout,
process-group cleanup, resource limits, output limits, and configured cwd roots,
but it does not claim filesystem or network isolation for safe profiles.
runner: bubblewrap is optional and capability-probed; if bwrap is missing or
cannot enforce the requested profile or network policy, safe profiles fail
closed while explicit passthrough profiles remain auditable passthrough.
The planned Landlock backend is documented in
docs/design/sandbox-landlock.md, including
capability probing, fallback order, and the compatibility test matrix.
| Operation | Default behavior |
|---|---|
| User-authored read-only command | May run when policy returns SAFE |
| First LLM-generated command | CONFIRM |
| Conversation-approved LLM command | May skip repeat confirmation only in the same conversation thread, including /resume of that thread |
| Destructive command | CONFIRM every time; never conversation-whitelisted |
| Command targeting root or sensitive paths | BLOCK when matched by policy |
| SSH batch across two or more hosts | Explicit batch confirmation with target hosts and remote profiles |
| Non-TTY confirmation request | Auto-deny |
| Unknown SSH host | Reject by default |
| Default sandbox runner | Records profile metadata only; no process isolation |
| Enabled safe sandbox profile unavailable | Fail closed before spawning |
linuxagent mcp starts a local stdio MCP server with read-only tools for
policy classification and audit hash-chain verification. It intentionally does
not expose command execution, file patch application, SSH fan-out, or secrets.
The threat model and future execution boundary are documented in
docs/design/mcp-server.md.
LinuxAgent is not an autonomous remediator. The current default noop
sandbox runner is also not a command sandbox; it is intended for controlled
operator-in-the-loop use. See Production Readiness and Threat Model.
SSH execution is not protected by local OS sandboxing. Configure cluster hosts
with least-privilege users, pre-registered known_hosts, a remote working
directory, and explicit sudo allowlists when sudo is required.
LinuxAgent v4 ships with eleven YAML runbooks for common diagnostics:
| Runbook area | Examples |
|---|---|
| Disk and filesystem | df, top directories, journal usage |
| Ports and networking | listeners, port ownership, connectivity checks |
| Services and logs | systemd status, recent unit logs, error search |
| System health, OS, load, and memory | overall host health, OS release, CPU pressure, memory pressure, OOM clues |
| Containers, packages, and certificates | container status, installed packages, certificate expiry |
Runbooks no longer perform natural-language hard matching before LLM planning. They are loaded, policy-validated, and supplied to the planner as advisory examples. The planner may use, adapt, or ignore that guidance based on the actual request. If it produces a multi-step plan inspired by a runbook, every step still goes through normal policy, HITL, audit, and analysis flow.
Current documented baseline from make test on 2026-05-11:
| Gate | Status |
|---|---|
| Unit tests | 677 passing |
| Optional provider compatibility | covered by make optional-anthropic when the extra is installed |
| Sandbox boundary suite | covered by make sandbox |
| Red-team policy suite | adversarial command corpus |
| Policy benchmark | P50/P95/P99 policy latency |
| Harness scenarios | scenario-driven HITL / runbook / cluster / sandbox coverage |
| Integration smoke tests | 10 passing |
| Coverage | 86.73% (--cov-fail-under=80) |
| Static checks | ruff, mypy, bandit, project code-rule checks |
| Build verification | wheel + sdist + packaged data install check |
Useful commands:
make test
make sandbox
make lint
make type
make security
make red-team
make benchmark
make harness
make verify-build| Path | Use when |
|---|---|
./scripts/bootstrap.sh |
You are working from a source checkout |
pip install -c constraints.txt https://github.com/Eilen6316/LinuxAgent/releases/download/v4.1.0/linuxagent-4.1.0-py3-none-any.whl |
You want the published GitHub Release wheel |
pip install linuxagent |
You want the PyPI package after the release is published |
pip install -e ".[dev]" |
You are developing or running the full local gate |
pip install -e ".[anthropic]" |
You need the optional Anthropic provider |
| Document | Purpose |
|---|---|
| Documentation index | All long-form docs in one place |
| docs/zh/README.md | Full Chinese manual |
| docs/en/README.md | Full English manual |
| Quick Start | Installation and first run |
| Provider Matrix | Provider setup paths and compatibility status |
| Operator Safety Model | Plain-language safety boundaries for users |
| Runbook Authoring | How to contribute safe YAML runbooks |
| Roadmap | Maintainer priorities and good first issue areas |
| Migration Guide | v3 to v4 breaking changes |
| Threat Model | Assets, trust boundaries, and mitigations |
| Production Readiness | Where LinuxAgent is and is not appropriate |
| Security Policy | Vulnerability reporting and supported versions |
| Contributing | Contribution workflow and review expectations |
| Changelog | Release history |
| Why substring matching is not safety | Technical argument behind LinuxAgent's command safety model |
| Real user feedback | Tell us what happened on a real server, VM, container, or homelab machine |
| Link | Notes |
|---|---|
| GitHub | Primary repository |
| GitCode | Mirror |
| Gitee | Mirror |
| QQ Group 281392454 | Community |
| CSDN intro | Project article |
MIT