LinuxAgent

A Linux ops CLI where LLM-generated commands must pass deterministic policy and human approval before execution.

简体中文完整文档 · Full English manual · why substring matching is not safety · v4.1.0 release notes · share real usage feedback

LinuxAgent is not a free-form shell chatbot and not an autonomous remediator. It lets an LLM propose Linux operations, but execution stays behind deterministic policy checks, Human-in-the-Loop confirmation, SSH safety guards, output redaction, and a hash-chained audit log.

The core project is intentionally narrow: command planning, policy, HITL, audit, SSH guardrails, and a small set of practical providers. Extra providers, runbooks, and integrations are treated as extension points rather than the center of the product.

Why It Exists

LLM command agents usually fail at the exact point operators care about: trust.

LinuxAgent's default stance is different:

Principle	What LinuxAgent does
The model is not trusted	First-time LLM-generated commands require confirmation
Safety is policy, not substring matching	Commands are tokenized and evaluated by a capability-based policy engine
Production output may contain secrets	Tool output is guarded and redacted before LLM-facing analysis
SSH must not silently trust hosts	Remote execution uses known-host verification and shell-syntax guards
Every approval should be reviewable	HITL decisions are written to a `0o600` hash-chained audit log

v4.1 Security Depth

LinuxAgent v4.1 turns the command safety boundary into a measurable subsystem, not just a set of claims in the README:

Layer	What changed
Red-team proof	24 adversarial command-agent cases run in CI with `make red-team`
Shell structure	Pipelines, subshells, command substitution, redirects, and nested shell execution are analyzed before execution
LOLBin coverage	Network-to-shell pipelines, `find -exec`, `xargs`, `awk system()`, editor escapes, and interpreter inline execution are classified deterministically
Fuzzing and benchmark	Hypothesis parser fuzzing plus P50/P95/P99 policy latency reporting
Audit depth	Optional HTTP sink forwards hash-chained entries while local append remains the source of truth
Observability	Telemetry can export local JSONL, console JSON, OTLP HTTP JSON, or be disabled explicitly
Sandbox roadmap	Landlock design documents capability probes, fallback order, compatibility limits, and implementation slices
Agent integration	`linuxagent mcp` exposes read-only policy classify and audit verify tools over stdio MCP

One-Minute Start

git clone https://github.com/Eilen6316/LinuxAgent.git
cd LinuxAgent
./scripts/bootstrap.sh
source .venv/bin/activate

Then edit ./config.yaml and set a provider. The core paths are OpenAI, DeepSeek, local Ollama/OpenAI-compatible models, and Anthropic:

api:
  provider: deepseek
  api_key: "replace-me"

For local Ollama:

api:
  provider: ollama
  base_url: http://127.0.0.1:11434/v1
  model: llama3.1
  api_key: ""
  token_parameter: max_tokens

Other OpenAI-compatible relays and provider shortcuts are documented in the Provider Compatibility Matrix. They are useful, but they are not the project's core safety story.

Validate and start:

linuxagent check
linuxagent

Try a read-only request:

check the Linux version

When a first LLM-generated command appears, choose Yes for one execution, Yes, don't ask again for matching commands only in this conversation and the same /resume thread, or No to refuse. Use !uname -a for direct operator-authored command mode.

config.yaml must be owned by the current user and chmod 600; secrets are not loaded from .env.

What a Turn Looks Like

you: find services listening on port 8080

parse_intent  -> LLM proposes: ss -tlnp sport = :8080
safety_check  -> CONFIRM (LLM_FIRST_RUN)
confirm       -> operator approves in terminal
execute       -> asyncio subprocess, no shell=True
analyze       -> concise operator summary
audit.log     -> hash-chained JSONL decision record

For ordinary conversation, LinuxAgent first asks an LLM-owned intent router for DIRECT_ANSWER, COMMAND_PLAN, or CLARIFY. Direct answers do not create a command plan and therefore do not show the confirmation panel. Operational methods are not hard-coded in Python; successful command patterns are learned in the local learner memory after sensitive values are redacted. Deterministic safety policy data lives in YAML, while Python code only loads, validates, and applies those policies.

Each CLI launch starts with an empty conversation context. Saved sessions are available only when the operator asks for it with /resume; then enter the shown number or use the interactive picker to resume that saved session. If the selected session stopped at a HITL confirmation, LinuxAgent reloads the local checkpoint and reopens the confirmation flow. Use /new to reset context inside a running CLI session and /tools to see available slash/tool entry points. Typing / opens the slash-command completion menu. Command confirmations use an arrow-key menu with Yes, Yes, don't ask again, and No when conversation permissions are allowed. Yes, don't ask again is scoped to the current conversation thread and the same thread when resumed with /resume. New conversations do not inherit it, and destructive or never_whitelist policy matches still ask every time. Input beginning with ! is direct command mode: LinuxAgent executes the operator-authored command, streams stdout/stderr live, and records both !<command> and the system result into the active conversation context. It does not ask the LLM to explain or generate a reply for that turn.

Core Capabilities

Capability	Why it matters
Capability-based policy engine	Produces `SAFE` / `CONFIRM` / `BLOCK`, risk scores, capabilities, and matched rules
YAML policy defaults	Command policy data is loaded from `configs/policy.default.yaml`, not Python rule tables
Structured `CommandPlan`	LLM output must validate as JSON before any policy or execution path
Structured file patches	Script/code/config edits use transactional `FilePatchPlan` apply, unified-diff validation, path policy, and HITL review
Read-only workspace tools	Planner can inspect allowed files with `read_file`, `list_dir`, and `search_files` before proposing patches
AI-owned intent routing	Conversation vs operation vs clarification is decided by `prompts/intent_router.md`, not Python keyword rules
Explicit resume control	New sessions do not inherit previous chats unless `/resume` is used; pending HITL checkpoints resume there too
Direct `!` command mode	Runs operator-authored commands without an AI reply and adds command/output to current context
Sandbox metadata boundary	Commands carry a selected sandbox profile into audit and telemetry; default `noop` records metadata only
YAML runbooks	Common ops procedures are injected as planner guidance, not pre-LLM hard routes
Learner memory	Successful command patterns are persisted locally after secret redaction
LangGraph HITL	Confirmation uses `interrupt()` and checkpointing rather than inline `input()`
SSH cluster guard	Batch confirmation, remote shell metacharacter blocking, remote profile audit
Output protection	Command results are redacted and bounded before model-facing analysis
Hash-chained audit	`linuxagent audit verify` detects local audit-log tampering
Reproducible release	`constraints.txt`, wheel verification, and packaged config/prompt/runbook checks

File Changes

Requests such as "create a shell script", "update this Python file", or "edit this config" do not bypass the safety model. LinuxAgent asks the planner for a structured FilePatchPlan, then validates and previews the unified diff before writing anything. The plan carries a structured request_intent field (create, update, or unknown) instead of relying on Python keyword matching.

The planner can first inspect the environment with read-only tools:

read_file(path, offset, limit) reads a bounded window from an allowed file.
list_dir(path) lists an allowed directory.
search_files(pattern, root) searches literal text under an allowed root; regex metacharacters are treated as ordinary text.
get_system_info, search_logs, and safety-gated execute_command provide system context when needed.

Tool calls run through the tool sandbox runtime before output reaches the model: each tool carries explicit permissions (read_files, write_files, execute_commands, system_inspect, network_access, and hitl mode), workspace/log roots are checked, per-tool timeouts and output limits are applied, oversized output is marked as truncated, and tool errors are returned as structured model-visible events while telemetry records allowed, denied, timeout, or truncated.

The terminal shows observable tool activity such as LinuxAgent is reading ... / LinuxAgent is listing ..., then prints concise evidence from completed workspace tool calls, such as the first line-numbered read_file snippets or search_files matches. If the planner concludes that no file change is needed, the final answer includes the cited evidence so the operator can see which file lines or search results supported the judgment. Patch confirmation shows per-file stats, compact + / - diff snippets, high-risk path warnings, permission changes, large-diff pagination, and per-file acceptance for multi-file patches. Full diffs are not shown twice; extra review prompts appear only when hidden pages exist.

Command confirmation also shows planned sandbox context: requested profile, runner, enforcement state, cwd, allowed roots, network policy, and fallback reason when the configured runner cannot enforce isolation.

After approval, patch application runs as a transaction. LinuxAgent validates target paths before reading file content, rejects symlink path components, hardlinks, directories, device files, FIFOs, sockets, oversized targets, and non-UTF-8 text, then writes through a temporary file and atomic replace. Existing targets are backed up under a local .linuxagent-patch-* sandbox directory and rolled back automatically if a later file or permission change fails. Audit metadata records changed files, permission changes, backup path hashes, rollback outcome, and the sandbox root.

By default, file patch reads and writes are limited to the current workspace and /tmp through file_patch.allow_roots. Sensitive roots such as /etc and SSH key directories are highlighted as high risk, and permission changes such as 0755 for generated scripts appear explicitly in the confirmation panel. Automatic patch repair defaults to two rounds and can be tuned with file_patch.max_repair_attempts (0 disables automatic patch repair). Failed command-plan repair is separately capped by command_plan.max_repair_attempts (0 disables failed-command replanning).

Sandbox Status

LinuxAgent local command execution now goes through a sandbox runner boundary. The default remains sandbox.enabled: false with runner: noop, which preserves compatibility while recording sandbox metadata only. runner: local applies process lifecycle controls such as clean environment, closed stdin, timeout, process-group cleanup, resource limits, output limits, and configured cwd roots, but it does not claim filesystem or network isolation for safe profiles. runner: bubblewrap is optional and capability-probed; if bwrap is missing or cannot enforce the requested profile or network policy, safe profiles fail closed while explicit passthrough profiles remain auditable passthrough. The planned Landlock backend is documented in docs/design/sandbox-landlock.md, including capability probing, fallback order, and the compatibility test matrix.

Safety Model

Operation	Default behavior
User-authored read-only command	May run when policy returns `SAFE`
First LLM-generated command	`CONFIRM`
Conversation-approved LLM command	May skip repeat confirmation only in the same conversation thread, including `/resume` of that thread
Destructive command	`CONFIRM` every time; never conversation-whitelisted
Command targeting root or sensitive paths	`BLOCK` when matched by policy
SSH batch across two or more hosts	Explicit batch confirmation with target hosts and remote profiles
Non-TTY confirmation request	Auto-deny
Unknown SSH host	Reject by default
Default sandbox runner	Records profile metadata only; no process isolation
Enabled safe sandbox profile unavailable	Fail closed before spawning

MCP Server Prototype

linuxagent mcp starts a local stdio MCP server with read-only tools for policy classification and audit hash-chain verification. It intentionally does not expose command execution, file patch application, SSH fan-out, or secrets. The threat model and future execution boundary are documented in docs/design/mcp-server.md.

LinuxAgent is not an autonomous remediator. The current default noop sandbox runner is also not a command sandbox; it is intended for controlled operator-in-the-loop use. See Production Readiness and Threat Model.

SSH execution is not protected by local OS sandboxing. Configure cluster hosts with least-privilege users, pre-registered known_hosts, a remote working directory, and explicit sudo allowlists when sudo is required.

Built-In Runbooks

LinuxAgent v4 ships with eleven YAML runbooks for common diagnostics:

Runbook area	Examples
Disk and filesystem	`df`, top directories, journal usage
Ports and networking	listeners, port ownership, connectivity checks
Services and logs	systemd status, recent unit logs, error search
System health, OS, load, and memory	overall host health, OS release, CPU pressure, memory pressure, OOM clues
Containers, packages, and certificates	container status, installed packages, certificate expiry

Runbooks no longer perform natural-language hard matching before LLM planning. They are loaded, policy-validated, and supplied to the planner as advisory examples. The planner may use, adapt, or ignore that guidance based on the actual request. If it produces a multi-step plan inspired by a runbook, every step still goes through normal policy, HITL, audit, and analysis flow.

Quality Gate

Current documented baseline from make test on 2026-05-11:

Gate	Status
Unit tests	677 passing
Optional provider compatibility	covered by `make optional-anthropic` when the extra is installed
Sandbox boundary suite	covered by `make sandbox`
Red-team policy suite	adversarial command corpus
Policy benchmark	P50/P95/P99 policy latency
Harness scenarios	scenario-driven HITL / runbook / cluster / sandbox coverage
Integration smoke tests	10 passing
Coverage	86.73% (`--cov-fail-under=80`)
Static checks	`ruff`, `mypy`, `bandit`, project code-rule checks
Build verification	wheel + sdist + packaged data install check

Useful commands:

make test
make sandbox
make lint
make type
make security
make red-team
make benchmark
make harness
make verify-build

Install Paths

Path	Use when
`./scripts/bootstrap.sh`	You are working from a source checkout
`pip install -c constraints.txt https://github.com/Eilen6316/LinuxAgent/releases/download/v4.1.0/linuxagent-4.1.0-py3-none-any.whl`	You want the published GitHub Release wheel
`pip install linuxagent`	You want the PyPI package after the release is published
`pip install -e ".[dev]"`	You are developing or running the full local gate
`pip install -e ".[anthropic]"`	You need the optional Anthropic provider

Documentation

Document	Purpose
Documentation index	All long-form docs in one place
docs/zh/README.md	Full Chinese manual
docs/en/README.md	Full English manual
Quick Start	Installation and first run
Provider Matrix	Provider setup paths and compatibility status
Operator Safety Model	Plain-language safety boundaries for users
Runbook Authoring	How to contribute safe YAML runbooks
Roadmap	Maintainer priorities and good first issue areas
Migration Guide	v3 to v4 breaking changes
Threat Model	Assets, trust boundaries, and mitigations
Production Readiness	Where LinuxAgent is and is not appropriate
Security Policy	Vulnerability reporting and supported versions
Contributing	Contribution workflow and review expectations
Changelog	Release history
Why substring matching is not safety	Technical argument behind LinuxAgent's command safety model
Real user feedback	Tell us what happened on a real server, VM, container, or homelab machine

Mirrors and Community

Link	Notes
GitHub	Primary repository
GitCode	Mirror
Gitee	Mirror
QQ Group 281392454	Community
CSDN intro	Project article

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.github		.github
benchmarks		benchmarks
configs		configs
docs		docs
prompts		prompts
runbooks		runbooks
scripts		scripts
src/linuxagent		src/linuxagent
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
blog.md		blog.md
constraints.txt		constraints.txt
logo.jpg		logo.jpg
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinuxAgent

Why It Exists

v4.1 Security Depth

One-Minute Start

What a Turn Looks Like

Core Capabilities

File Changes

Sandbox Status

Safety Model

MCP Server Prototype

Built-In Runbooks

Quality Gate

Install Paths

Documentation

Mirrors and Community

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LinuxAgent

Why It Exists

v4.1 Security Depth

One-Minute Start

What a Turn Looks Like

Core Capabilities

File Changes

Sandbox Status

Safety Model

MCP Server Prototype

Built-In Runbooks

Quality Gate

Install Paths

Documentation

Mirrors and Community

License

About

Topics

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages