Conversation
* fix(kernel): glob-match declared tools and auto-promote shell_exec exec_policy
Two bugs prevented MCP wildcard tools and shell_exec from working in
user-defined agents:
1. available_tools() used exact string equality (d == &t.name) when
filtering builtins, skill tools, and MCP tools against the agent's
declared capabilities.tools list. Patterns like 'mcp_filesystem_*'
or 'file_*' were never matched, so those tools were silently dropped
from the list sent to the LLM — making them unreachable even though
the MCP server was connected and reporting 14 tools.
Fix: replace == with glob_matches() at all three filter sites.
2. When an agent declared shell_exec in capabilities.tools but had no
explicit exec_policy, it inherited the global ExecPolicy whose
default mode is Deny. This caused shell_exec to be stripped from
available_tools() (step 6), so execute_tool() returned 'Permission
denied: agent does not have capability' before ever reaching the
exec-policy enforcement layer.
Fix: at spawn and restore time, if the agent declares shell_exec (or
'*') in tools and has no explicit exec_policy, promote exec_policy
mode to Full instead of blindly inheriting the global Deny default.
* test(kernel): add regression tests for glob tool matching and shell_exec exec_policy
- test_available_tools_glob_pattern_matches_mcp_tools: verifies that
patterns like 'file_*' in capabilities.tools correctly match builtin
tools (file_read, file_write, file_list) and exclude non-matching ones
- test_shell_exec_available_when_declared_in_tools_without_explicit_exec_policy:
verifies that an agent declaring shell_exec in tools with no explicit
exec_policy gets auto-promoted to Full mode and has shell_exec present
in available_tools()
* feat(plugin): expose full context engine hooks to plugins
Previously plugins could only hook into `ingest` and `after_turn`,
leaving `assemble`, `compact`, `bootstrap`, and subagent lifecycle
inaccessible — plugins had no control over what the LLM actually sees.
Now all 6 ContextEngine trait hooks are scriptable:
- bootstrap: run custom init logic (connect vector DBs, warm caches)
- assemble: fully control context window assembly before each LLM call
- compact: custom compression strategy under context pressure
- prepare_subagent / merge_subagent: manage child agent memory scopes
Protocol: same JSON stdin/stdout as existing hooks, with graceful
fallback to DefaultContextEngine on script error or empty output.
ContextEngineHooks and PluginManifest gain 5 new optional fields;
existing plugin.toml files require no changes.
* feat(plugin): full message fidelity, scaffold templates, and docs
- assemble/compact hooks now receive full message structure (tool_use,
tool_result, image, thinking blocks) instead of text_content() strips
- deserialize script response via serde Message — accepts both plain
text and structured block content back from scripts
- scaffold plugin.toml now hand-written with comments showing all 7
hooks; PluginRuntime gains script_extension() helper
- Python scaffold adds assemble.py and compact.py example templates
with working token-budget logic and pinned-message awareness
- docs: zh/agent/plugins updated — hook lifecycle table, full protocol
specs for all 7 hooks including assemble request/response examples
* feat(plugin): scaffold assemble/compact templates for all runtimes, sync en docs
- Node.js: full assemble.js + compact.js with token-budget logic
- Deno/TypeScript: full assemble.ts + compact.ts implementations
- Go: full assemble.go + compact.go with pinned-message awareness
- V, Ruby, Bash, Bun, PHP, Lua, Native: minimal no-op stubs that
return the input unchanged so the default engine fallback kicks in
- HookFiles struct replaces the 4-tuple return from hook_templates()
for clearer field access; all 4 hook files get executable bit set
- docs/en: agent/plugins page synced with zh — hook lifecycle table,
protocol specs for all 7 hooks, manifest field table, assemble
request/response examples with tool_use/tool_result blocks
* feat(plugin): configurable hook timeout, bootstrap scaffold, exec-bit validation
- plugin.toml: new top-level hook_timeout_secs field (default 30s)
bootstrap hook always gets 2× this value since it runs once and may
need extra time to connect to external services
- context_engine: run_hook() accepts explicit timeout; ScriptableContextEngine
stores hook_timeout_secs from config; bootstrap uses saturating_mul(2);
after_turn captures timeout before tokio::spawn
- context_engine: bootstrap validation now checks executable bit on Unix
and warns with "chmod +x" hint before the hook runs (not at runtime)
- plugin_manager: scaffold now writes 7 hook files per plugin instead of 4
— bootstrap.py / prepare_subagent.py / merge_subagent.py added for all
11 runtimes; Python gets full annotated templates, Node/Deno/Go get real
bootstrap implementations, other runtimes get minimal lifecycle stubs
- HookFiles struct gains bootstrap / prepare_subagent / merge_subagent fields
- plugin.toml template updated: hook_timeout_secs comment, all 7 hooks listed
as commented-out entries pointing to the scaffolded files
- docs (en + zh): directory listing shows all 7 scaffolded files; manifest
field table adds hook_timeout_secs; plugin.toml example updated
* feat(plugin): agent_id in assemble, full after_turn messages, compact summary, env config, hook metrics, hot-reload, plugin stacking
- ContextEngine::assemble now receives `agent_id: AgentId` so hook
scripts can scope per-agent recall; both DefaultContextEngine and
ScriptableContextEngine updated; both agent_loop.rs call sites pass
session.agent_id
- compact hook reads optional `summary` field from script JSON output
instead of hardcoding the string "plugin compaction"
- after_turn now sends the full serialised Message structs to hook
scripts (previously truncated to 500-char text summary)
- [env] section in plugin.toml: PluginManifest gains
`env: HashMap<String,String>`; values starting with `${VAR}` are
expanded from the daemon's environment at invocation time; the env
pairs are forwarded into every hook subprocess via HookConfig
- Hook invocation metrics: HookStats / HookMetrics structs added;
ScriptableContextEngine holds Arc<Mutex<HookMetrics>>; run_hook()
returns (output, elapsed_ms); all 7 hook call sites record calls /
successes / failures / total_ms; metrics() accessor exposes a snapshot
- POST /api/plugins/{name}/reload hot-reload endpoint: re-reads
plugin.toml from disk via reload_plugin() in plugin_manager; script
changes take effect immediately, manifest changes noted in response
- Plugin stacking via StackedContextEngine: ContextEngineTomlConfig
gains `plugin_stack: Option<Vec<String>>`; when ≥2 plugins are
listed build_context_engine chains them with merge semantics:
ingest merges all memories, assemble first-non-empty wins, compact
first-success wins, after_turn runs all sequentially, lifecycle
hooks run sequentially with early-exit on error
* feat(plugin): failure policy, retry, ingest filter, version compat, load-time validation, metrics/status endpoints, scaffold update
Hook behavior config (ContextEngineHooks):
- on_hook_failure: "warn" | "abort" | "skip" — replaces hardcoded warn+fallthrough
- max_retries / retry_delay_ms — retry failing hooks before applying policy
- ingest_filter — skip ingest hook when message doesn't contain substring
ScriptableContextEngine:
- apply_failure_policy() helper respects on_hook_failure across all 6 hooks
(after_turn stays fire-and-forget, bootstrap always non-fatal by design)
- run_hook() now loops up to max_retries times with retry_delay_ms sleep
- ScriptableContextEngine::new() warns at construction time for any declared
hook script that cannot be found on disk (load-time validation, item 6)
- hook_metrics() added to ContextEngine trait (default None); overridden in
ScriptableContextEngine and StackedContextEngine (aggregates all engines)
Plugin manifest (PluginManifest):
- librefang_min_version: Option<String> — refuses to load incompatible plugins
- load_plugin_manifest checks version via version_satisfies() helper
API endpoints:
- GET /api/plugins/{name}/status — manifest info + whether plugin is
currently active (single or stack) in the running context engine
- GET /api/context-engine/metrics — hook invocation snapshot (204 when
no scriptable engine is active)
Scaffold template (item 9):
- plugin.toml template now shows librefang_min_version, max_retries,
retry_delay_ms, on_hook_failure, ingest_filter as commented examples
- [env] section shown as a commented block with ${} expansion example
* docs(plugin): document env, hook_timeout_secs, plugin_stack, hot-reload, and metrics
* feat(plugin): schema validation, integrity checks, enable/disable, upgrade, lint, traces, test-hook
Round 8 plugin context engine improvements:
- Hook JSON Schema input/output validation (hook_schemas config)
- hook_protocol_version compatibility check with warning on mismatch
- Resource limits: max_memory_mb advisory via env var, allow_network soft isolation
- Plugin integrity: SHA-256 hash verification at load time (pure-Rust impl)
- Plugin dependency declaration: plugin_depends checked at load time
- Enable/disable toggle: .disabled marker file, preserved across upgrades
- Upgrade endpoint: remove + reinstall preserving enabled state
- Per-agent hook filter: only_for_agent_ids substring match on ingest/assemble/after_turn
- Hook invocation traces: ring buffer (cap 100), HookTrace records per call
- GET /api/context-engine/traces endpoint
- POST /api/plugins/{name}/test-hook: dry-run a specific hook with test input
- GET /api/plugins/{name}/lint: static manifest + script validation report
- enabled field on PluginInfo, exposed in list/get/status endpoints
* feat(plugin): sign endpoint, health/chain introspection, ingest cache, list filtering
Round 9 plugin context engine improvements:
- POST /api/plugins/{name}/sign: compute & write SHA-256 integrity hashes into plugin.toml
- GET /api/context-engine/health: lint-based smoke test of active plugin(s), returns 503 when degraded
- GET /api/context-engine/chain: active engine topology (default/single/stacked) with per-plugin hook coverage
- hook_cache_ttl_secs config: TTL-based in-memory cache for ingest hook (skip subprocess on identical input)
- GET /api/plugins now accepts ?enabled=true/false and ?has_errors=true/false filter params
- sha256_hex made pub for reuse across crates
- strip_toml_section helper for targeted plugin.toml rewrites
* feat(plugin): Round 10 — persistent pool, parallel stacked exec, caches, sandbox, shared state, metrics
- HookProcessPool: persistent subprocess pool eliminating interpreter startup per call
- StackedContextEngine: parallel ingest/after_turn via join_all with priority ordering
- Assemble + compact TTL caches (assemble_cache_ttl_secs, compact_cache_ttl_secs)
- ingest_regex: compiled regex_lite filter on ingest hook (skips non-matching messages)
- enable_shared_state: injects LIBREFANG_STATE_FILE env var into all hook subprocesses
- env_schema: load-time validation of required env vars (! prefix = required)
- Linux network sandbox: unshare --net wrapper for allow_network=false hooks
- GET /api/context-engine/metrics/prometheus: Prometheus text format scrape endpoint
- POST /api/plugins/batch: bulk enable/disable/lint/sign operations
- GET /api/plugins/{name}/export: tar.gz download of plugin directory
- GET /api/plugins/{name}/update-check: registry version comparison via GitHub
- POST /api/plugins/{name}/benchmark: p50/p95/p99 latency measurement
- GET/DELETE /api/plugins/{name}/state: shared state KV read/reset
* feat(plugin): Round 11 — circuit breaker, trace persistence, health heartbeat, sandboxing, pre-warm, per-agent metrics
- CircuitBreakerConfig: auto-suspend hooks after N consecutive failures (configurable cooldown)
- HookProcessPool::health_check(): detect dead persistent subprocesses via child.try_wait()
- HookProcessPool::prewarm(): pre-warm subprocesses at daemon init (prewarm_subprocesses = true)
- after_turn bounded queue: Semaphore(after_turn_queue_depth) limits concurrent background tasks
- allow_filesystem = false: restricts HOME/TMPDIR and injects LIBREFANG_READONLY_FS=1
- Capability manifest: plugin.toml needs = [...] array, lint validates known capabilities
- Hook output schema validation: call_hook_dispatch checks schema.output after every invocation
- Per-agent metrics: ScriptableContextEngine tracks HookStats per agent_id
- SQLite trace store: trace_store.rs with 10,000-row rolling window, query/filter API
- install_plugin_deps(): runtime-aware dep installer (pip/npm/bun/go/bundle/composer)
- open_trace_store(): opens ~/.librefang/hook_traces.db
- GET /api/plugins/registry/search: search GitHub plugin registry by keyword
- POST /api/plugins/:name/install-deps: install runtime dependencies
- GET /api/context-engine/metrics/per-agent: per-agent call breakdown
- GET /api/context-engine/traces/history: SQLite trace query with filters
- GET /api/context-engine/metrics/summary: aggregated hook performance summary
- GET /api/plugins/:name/advanced-config: expose all Round 11 hook config fields
- GET /api/plugins/:name/env: plugin env vars + env_schema live presence check
- GET /api/context-engine/config: running engine configuration snapshot
- POST /api/plugins/:name/prewarm: trigger pre-warm check via API
- GET /api/context-engine/sandbox-policy: per-plugin sandbox policy summary
* fix(plugin): address 10 code-review issues from PR audit
Circuit breaker, SQLite traces, dead-process eviction, TTL cache naming,
prewarm runtime, registry validation, tar path traversal, semaphore docs.
- Fix 1 (already applied): half-open circuit breaker state machine
- Fix 2: parameterize count() in TraceStore — eliminate SQL injection via string interpolation
- Fix 3: evict dead HookProcessPool slot to None before spawn attempt on crash-restart
- Fix 4: rename TTL cache binding `exp` → `inserted_at` for clarity
- Fix 6: wire TraceStore::insert() into push_trace() — SQLite now populated on every hook call
- Add `plugin_name` + `trace_store` fields to ScriptableContextEngine
- Populate both fields in with_plugin_name(); open_trace_store() failure is non-fatal
- Thread trace_store + plugin_name through run_hook() and after_turn background task
- Fix pre-existing broken test (run_hook arg-count mismatch vs. its own signature)
- Fix 7: prewarm() was hardcoded to PluginRuntime::Python — use self.runtime instead
- Fix 8: validate `registry` query/body param (owner/repo format) in search and upgrade endpoints
- Fix 9: document sem.acquire().await.ok() — explains intentional AcquireError swallow on shutdown
- Fix 10: export_plugin tar prefix uses validated route param `name` not info.manifest.name
* feat(plugin): Round 12 — persistent trace, schema validation, process eviction, Prometheus labels
- Persistent subprocess path now emits HookTrace (both call_hook_dispatch_raw and after_turn
background task were missing push_trace — traces only landed in non-persistent mode before)
- validate_schema extended: type, enum, minimum/maximum, minLength/maxLength, properties recursion
- HookProcessPool.evict(path) + evict_all() added; ScriptableContextEngine.evict_hook_processes()
evicts all declared hook script slots on hot-reload
- Prometheus metrics now include plugin="<name>" label on all hook metric lines
- reload_plugin API response documents process eviction semantics for persistent subprocess users
* feat(plugin): Round 13 — graceful shutdown, Landlock sandbox, stacked engine isolation
after_turn graceful shutdown:
- JoinSet<()> field tracks all spawned after_turn tasks (replaces fire-and-forget)
- try_join_next() reaps completed tasks on each spawn to prevent unbounded growth
- wait_for_after_turn_tasks(timeout_secs) polls until empty or deadline for clean shutdown
Landlock filesystem sandbox (Linux 5.13+, opt-in via landlock-sandbox feature):
- landlock = "0.4" added as optional dependency
- try_apply_landlock_readonly() applies unprivileged read-only LSM restriction in child
- pre_exec hook injects Landlock before exec() when allow_filesystem=false
- Falls back to unshare --mount on older kernels; advisory env vars on non-Linux
- No-op stub for non-Linux/feature-disabled builds
StackedContextEngine isolation:
- ingest: per-engine 30s timeout + error isolation; merges successful recalls only
- after_turn: per-engine 30s timeout + warn on individual engine failure/timeout
- per_agent_metrics() now aggregated across all child engines (calls/failures/total_ms summed)
* feat(plugin): Round 14 — file locking, output cap, dep resolution, after_turn inspection
Shared state file locking:
- once_cell STATE_FILE_LOCKS registry: per-path tokio::sync::Mutex prevents concurrent
read-modify-write races on the shared state JSON file between ingest + after_turn
Hook output size cap:
- do_call (persistent) + run_hook_json (non-persistent): reject outputs > 4 MiB with
PluginRuntimeError::InvalidOutput before JSON parsing to prevent OOM from bad scripts
Plugin dependency resolution:
- check_plugin_needs(): validates all `needs = [...]` deps are installed, warns on miss
- resolve_install_order(): DFS topological sort over registry index, detects cycles
- install_plugin_with_deps(): fetches index, resolves order, installs deps-first
- install_plugin() now warns on unmet deps after install (non-fatal)
- POST /api/plugins/install-with-deps endpoint
after_turn result inspection:
- process_after_turn_output(): handles "log" (info log), "annotations" (debug log),
"memories" (future memory injection via substrate)
- HookTrace gains optional `annotations` field (serialized from hook output)
- after_turn background task now captures and processes hook JSON output
* feat(plugin): Round 15 — state lock wiring, memory injection, per-agent CB, Prometheus summary, checksum verify
State file locking (wired):
- lock_state_file() now called in HookProcessPool::call() and run_hook_json()
- Lock held for full subprocess lifetime, serializes ingest+after_turn across different scripts
Memory substrate wiring:
- DefaultContextEngine::memory_substrate() accessor added
- ScriptableContextEngine captures Arc<MemorySubstrate> at construction
- after_turn background task passes substrate to process_after_turn_output()
- "memories" array from hook output now actually stored in the memory substrate
Per-agent circuit breaker isolation:
- circuit_is_open/circuit_record keyed by "{agent_id}:{hook}" when agent in context
- One misbehaving agent no longer trips the breaker for all other agents
- ingest/assemble/compact pass Some(&agent_id); bootstrap/subagent pass None
Prometheus summary metrics:
- librefang_hook_duration_ms{quantile="avg"} + _sum + _count per hook
- Allows Prometheus to compute rate() and average latency per hook
Plugin checksum verification:
- fetch_checksum() tries {url}.sha256 and checksums.txt alongside archive
- verify_checksum() SHA-256 compares downloaded bytes, fails install on mismatch
- Warns (non-fatal) when no checksum file found; info! on verified install
* feat(plugin): Round 16 — trace_id correlation, seccomp sandbox, Ed25519 registry signing
* feat(plugin): Round 17 — per-hook timeout config, batch prewarm API, plugin health endpoint
* feat(plugin): Round 18 — circuit breaker persistence, output schema strict mode
* feat(plugin): Round 19 — exponential backoff retry, semver dep constraints, stack health summary
* feat(plugin): Round 20 — manifest compat check, exit code classification, bootstrap config overrides
* feat(plugin): Round 21 — correlation_id per turn, registry cache, RSS memory logging
* fix(plugin): suppress dead_code lint on non-Linux sandbox stubs
* feat(plugin): Round 22 — hook rate limiting, zero-downtime hot-reload swap_prewarm
* fix(plugin): rustfmt — expand single-line fns and method chains
* feat(plugin): Round 23 — per-agent state isolation, plugin event bus
* feat(plugin): Round 24 — archive Ed25519 signing, Wasm runtime variant
* feat(plugin): Round 25 — compact token pressure, weighted stacked ingest
* fix(plugin): code review — CB restore, schema trace, async task drain, RL scope
* fix(plugin): persistent-path schema validation — correct CB + trace ordering
* fix(plugin): parse_version strips pre-release suffixes before u32 parse
* fix(plugin): wire plugin_stack_weights and on_event bus subscription
* fix(plugin): compact token_pressure uses actual message token estimate
* fix(plugin): compact hook compacted_count reports removed messages not total
* fix(plugin): after_turn hook now checks and updates circuit breaker
* fix(plugin): read hook stdout and stderr concurrently to prevent pipe deadlock
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…g#2093) * fix(runtime): detect "[no reply needed]" as silent response The LLM sometimes responds with "[no reply needed]" as literal text instead of the NO_REPLY token. This was not detected by is_no_reply() and leaked to channel users as a visible message. Add "[no reply needed]" pattern matching alongside existing NO_REPLY detection. * fix(runtime): deduplicate is_no_reply arms, add unbracketed variant and tests - Remove copy-paste duplicate match arms (③④ identical to ①②) - Replace with unbracketed "no reply needed" variant the model sometimes emits - Update doc comment explaining why [no reply needed] appears (self-reinforcing loop: runtime writes placeholder → LLM mimics it on later turns) - Add unit test covering all positive/negative cases --------- Co-authored-by: Federico Liva <federico.liva@espero.it> Co-authored-by: Evan <suzukaze.haduki@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…librefang#2097) * feat(kernel): per-channel session isolation via deterministic UUID v5 Each communication channel (Telegram, WhatsApp, cron, API) now gets its own isolated agent session. SessionId::for_channel() derives a deterministic UUID v5 from (agent_id, channel_type), used in both execute_llm_agent() and the streaming path. Cron jobs get a synthetic SenderContext with channel="cron". API calls without SenderContext fall back to the default session_id for retrocompatibility. Additional safety: - Channel name normalized to lowercase (prevents "Telegram" vs "telegram" split) - reset_session() and reboot_session() clear ALL channel sessions - No DB migration needed — new sessions are just new rows with UUID v5 IDs * refactor(docs): align URL hierarchy with sidebar nav groups (librefang#2119) * refactor(docs): align URL hierarchy with sidebar nav groups Every non-config doc page sat at the URL root (/skills, /api, /faq...) but the sidebar grouped them under headings like 'Agent', 'Integrations', 'Operations'. Nav hierarchy and URL hierarchy disagreed, so /skills appearing under 'Agent' in the sidebar felt wrong. This restructure makes every URL reflect its sidebar group: /librefang -> /getting-started /roadmap -> /getting-started/roadmap /examples -> /getting-started/examples /glossary -> /getting-started/glossary /comparison -> /getting-started/comparison /providers/* -> /configuration/providers/* /security -> /architecture/security /agents -> /agent/templates (avoid awkward /agent/agents) /hands -> /agent/hands /memory -> /agent/memory /skills -> /agent/skills /plugins -> /agent/plugins /workflows -> /agent/workflows /channels/* -> /integrations/channels/* /api/* -> /integrations/api/* /sdk -> /integrations/sdk /cli/* -> /integrations/cli/* /android-termux -> /integrations/android-termux /mcp-a2a -> /integrations/mcp-a2a /migration -> /integrations/migration /desktop -> /integrations/desktop /development -> /integrations/development /troubleshooting -> /operations/troubleshooting /production -> /operations/production /faq -> /operations/faq prompt-intelligence stays at the root (not in any nav group). Internal cross-links were rewritten only inside markdown link syntax (](...)) and href attributes — API endpoint paths documented in headings like 'GET /api/agents/{id}/skills' are NOT doc navigation and were left intact. Navigation.tsx updated for both en and zh. A public/_redirects file ships 52 backward-compat redirects (27 en + 25 zh) so every old URL still resolves on Cloudflare Pages — no broken bookmarks, no dead SEO. Pre-existing bug fixed in passing: zh/page.mdx's 'Getting Started' link pointed at the en page (/librefang) instead of the zh one. Locales stay fully symmetric. * refactor(web): update marketing site doc link to new /agent/hands path Part of the docs URL restructure (PR librefang#2119). The hands section link in the marketing site pointed at the old /hands URL — 301 redirect would still catch it, but direct links are a cleaner hop. * refactor(docs): move prompt-intelligence under Agent nav group Was the lone orphan page sitting at the URL root with no sidebar entry. It's an agent feature (prompt version management + A/B experiments for agents) so it belongs next to skills/plugins/memory under /agent/. - Moved /prompt-intelligence -> /agent/prompt-intelligence (en + zh) - Added nav entry in Navigation.tsx (both locales), between Plugins and Workflows - Updated 4 internal cross-links in configuration docs - Added redirect stubs for /prompt-intelligence and /zh/prompt-intelligence Every doc page now has a sidebar entry. * fix(kernel): unify agent manifest path on workspaces/agents/ (librefang#2102) (librefang#2118) Runtime read/write paths for per-agent `agent.toml` were split between `<home>/agents/<name>/` and `<home>/workspaces/agents/<name>/`, so `reload_agent_from_disk`, `persist_agent_enabled`, boot-time enabled checks, and the channel bridge's `spawn_agent_by_name` silently failed on the default install layout (the file lives under `workspaces/`). Unify every runtime reader/writer on `workspaces/agents/<name>/` via `KernelConfig::effective_agent_workspaces_dir()`, add a one-shot `migrate_legacy_agent_dirs` pass at boot that relocates any stray `<home>/agents/<name>/` directories, and expose it as `Kernel::relocate_legacy_agent_dirs()` so `POST /api/migrate` can relocate its own output without a daemon restart. Self-heal stale `source_toml_path` values in the agent registry: when the stored path no longer exists, fall back to the canonical workspaces location and persist the repoint back to SQLite. Also fix `librefang doctor --repair`, correct the desktop import command docstring + `~/.librefang/agents/` references across docs (EN + ZH), the OpenAPI JSON, and the TypeScript bindings. * fix(ci): drop Ubuntu RUST_TEST_THREADS to 1 (librefang#2117) * ci(todo-to-issue): ignore plugin scaffold templates (librefang#2120) The TODO-to-issue scanner was filing issues for TODO markers that live inside raw-string plugin code templates in plugin_manager.rs. Those markers are placeholders rendered into user-generated plugin projects, not tasks for the LibreFang team. Adding the file to IGNORE stops the bogus issue flood (librefang#2104-librefang#2115 were all false positives). * chore: bump version to v2026.4.6 (librefang#2122) * Revert "chore: bump version to v2026.4.6 (librefang#2122)" (librefang#2126) This reverts commit cd7b068. * chore: bump version to v2026.4.6-beta15 (librefang#2127) * fix: address review feedback on channel session isolation - Replace RFC 4122 NAMESPACE_DNS with a dedicated random UUID to avoid collisions with other UUID v5 consumers - Save session summaries for ALL channel sessions (not just default) before reset_session() deletes them — prevents silent history loss on /reset - Add get_agent_session_ids() to session store for iterating agent sessions - Fix unused variable warning in reboot_session --------- Co-authored-by: Federico Liva <federico.liva@espero.it> Co-authored-by: Evan <suzukaze.haduki@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…ibrefang#2098) * feat(runtime): save channel images as files instead of inline base64 Channel images are now saved to disk and referenced by path in sessions, instead of embedding megabytes of base64 data that bloats session storage and causes context overflow. - Add ContentBlock::ImageFile variant with media_type and path fields - Channel bridge saves to /tmp/librefang_uploads/ with 1024px downscale - Claude Code driver passes file path directly for ImageFile - API drivers lazy-load base64 from file at call time - Background cleanup removes temp files older than 24h hourly - Graceful fallback to base64 on disk write failure * refactor(docs): align URL hierarchy with sidebar nav groups (librefang#2119) * refactor(docs): align URL hierarchy with sidebar nav groups Every non-config doc page sat at the URL root (/skills, /api, /faq...) but the sidebar grouped them under headings like 'Agent', 'Integrations', 'Operations'. Nav hierarchy and URL hierarchy disagreed, so /skills appearing under 'Agent' in the sidebar felt wrong. This restructure makes every URL reflect its sidebar group: /librefang -> /getting-started /roadmap -> /getting-started/roadmap /examples -> /getting-started/examples /glossary -> /getting-started/glossary /comparison -> /getting-started/comparison /providers/* -> /configuration/providers/* /security -> /architecture/security /agents -> /agent/templates (avoid awkward /agent/agents) /hands -> /agent/hands /memory -> /agent/memory /skills -> /agent/skills /plugins -> /agent/plugins /workflows -> /agent/workflows /channels/* -> /integrations/channels/* /api/* -> /integrations/api/* /sdk -> /integrations/sdk /cli/* -> /integrations/cli/* /android-termux -> /integrations/android-termux /mcp-a2a -> /integrations/mcp-a2a /migration -> /integrations/migration /desktop -> /integrations/desktop /development -> /integrations/development /troubleshooting -> /operations/troubleshooting /production -> /operations/production /faq -> /operations/faq prompt-intelligence stays at the root (not in any nav group). Internal cross-links were rewritten only inside markdown link syntax (](...)) and href attributes — API endpoint paths documented in headings like 'GET /api/agents/{id}/skills' are NOT doc navigation and were left intact. Navigation.tsx updated for both en and zh. A public/_redirects file ships 52 backward-compat redirects (27 en + 25 zh) so every old URL still resolves on Cloudflare Pages — no broken bookmarks, no dead SEO. Pre-existing bug fixed in passing: zh/page.mdx's 'Getting Started' link pointed at the en page (/librefang) instead of the zh one. Locales stay fully symmetric. * refactor(web): update marketing site doc link to new /agent/hands path Part of the docs URL restructure (PR librefang#2119). The hands section link in the marketing site pointed at the old /hands URL — 301 redirect would still catch it, but direct links are a cleaner hop. * refactor(docs): move prompt-intelligence under Agent nav group Was the lone orphan page sitting at the URL root with no sidebar entry. It's an agent feature (prompt version management + A/B experiments for agents) so it belongs next to skills/plugins/memory under /agent/. - Moved /prompt-intelligence -> /agent/prompt-intelligence (en + zh) - Added nav entry in Navigation.tsx (both locales), between Plugins and Workflows - Updated 4 internal cross-links in configuration docs - Added redirect stubs for /prompt-intelligence and /zh/prompt-intelligence Every doc page now has a sidebar entry. * fix(kernel): unify agent manifest path on workspaces/agents/ (librefang#2102) (librefang#2118) Runtime read/write paths for per-agent `agent.toml` were split between `<home>/agents/<name>/` and `<home>/workspaces/agents/<name>/`, so `reload_agent_from_disk`, `persist_agent_enabled`, boot-time enabled checks, and the channel bridge's `spawn_agent_by_name` silently failed on the default install layout (the file lives under `workspaces/`). Unify every runtime reader/writer on `workspaces/agents/<name>/` via `KernelConfig::effective_agent_workspaces_dir()`, add a one-shot `migrate_legacy_agent_dirs` pass at boot that relocates any stray `<home>/agents/<name>/` directories, and expose it as `Kernel::relocate_legacy_agent_dirs()` so `POST /api/migrate` can relocate its own output without a daemon restart. Self-heal stale `source_toml_path` values in the agent registry: when the stored path no longer exists, fall back to the canonical workspaces location and persist the repoint back to SQLite. Also fix `librefang doctor --repair`, correct the desktop import command docstring + `~/.librefang/agents/` references across docs (EN + ZH), the OpenAPI JSON, and the TypeScript bindings. * fix(ci): drop Ubuntu RUST_TEST_THREADS to 1 (librefang#2117) * ci(todo-to-issue): ignore plugin scaffold templates (librefang#2120) The TODO-to-issue scanner was filing issues for TODO markers that live inside raw-string plugin code templates in plugin_manager.rs. Those markers are placeholders rendered into user-generated plugin projects, not tasks for the LibreFang team. Adding the file to IGNORE stops the bogus issue flood (librefang#2104-librefang#2115 were all false positives). * chore: bump version to v2026.4.6 (librefang#2122) * Revert "chore: bump version to v2026.4.6 (librefang#2122)" (librefang#2126) This reverts commit cd7b068. * chore: bump version to v2026.4.6-beta15 (librefang#2127) * fix: address review feedback on image file references - Implement lazy base64 loading in anthropic, openai, and gemini drivers so ImageFile blocks are read from disk at call time instead of silently dropped. Fixes silent regression for non-Claude-Code providers. - Add hourly cleanup task in kernel.rs that removes files in the upload directory older than 24 hours (previously missing). - Replace hardcoded /tmp/librefang_uploads with std::env::temp_dir() for cross-platform compatibility. - Add comment in compactor noting the ImageFile token estimate imprecision. --------- Co-authored-by: Federico Liva <federico.liva@espero.it> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Evan <suzukaze.haduki@gmail.com>
…fang#2131) * feat(approval): add TOTP second-factor verification for critical tool approvals Closes librefang#2128 When `second_factor = "totp"` is set in the approval policy, approving dangerous tool executions requires a 6-digit TOTP code from an authenticator app (Google Authenticator, 1Password, etc.), preventing same-channel self-approval attacks. Core: - SecondFactor enum (none/totp) + ApprovalPolicy fields - RFC 6238 TOTP verification via totp-rs (SHA-1, 6 digits, 30s, ±1 window) - QR code generation (base64 PNG) for authenticator enrollment - 8 one-time recovery codes stored in AES-256-GCM vault - Per-tool TOTP control via totp_tools glob patterns - Grace period (totp_grace_period_secs) to skip re-verification - Rate limiting: 5 consecutive failures → 5 minute lockout - Audit log tracks second_factor_used per decision - Boot-time config consistency check (warn if TOTP enabled but not enrolled) API: - POST /api/approvals/{id}/approve accepts optional totp_code - POST /api/approvals/totp/setup — generate secret + QR + recovery codes - POST /api/approvals/totp/confirm — verify enrollment - GET /api/approvals/totp/status — enrollment/enforcement status - DELETE /api/approvals/totp — revoke with verification Dashboard: - Settings > Security: TOTP setup flow with QR code display, recovery codes, reset/revoke - Approvals page: inline TOTP input on approve, batch approve disabled when enforced - NotificationCenter: redirects to Approvals page when TOTP required - Full i18n (EN + ZH) Channel: - /approve <id> <totp-code> command syntax - Interactive notification hides Approve button when TOTP enabled - Recovery code support (xxxx-xxxx format auto-detected) Docs: - Configuration reference (EN + ZH): all fields, setup guide, rate limiting - API reference (EN + ZH): all endpoints with examples - Security architecture (EN + ZH): threat model, flow diagrams, properties * fix(approval): fix TOTP PR compilation errors and review issues - Fix 8 pre-existing test calls to resolve() missing new totp_verified and user_id arguments after signature change - Fix unused import: rand::Rng → rand::RngExt (random_range is on RngExt) - Fix moved value in test_totp_grace_zero_means_always_require by cloning policy before passing to ApprovalManager::new - Fix ApprovalsPage TOTP input to accept recovery codes (xxxx-xxxx format) in addition to 6-digit TOTP codes - Fix totp_revoke to consume recovery code on use (persist updated list) - Fix approval_types tests for new SecondFactor and second_factor_used fields - Fix routes/system.rs approve handler to forward body parameter - Fix doc section numbering: subsection 18.1 → 20.1 under section 20 * fix(approval): harden TOTP security — recovery code validation, lockout timing, batch endpoint - Replace loose `contains('-')` recovery code detection with strict format check (`DDDD-DDDD`) to prevent TOTP rate-limit bypass via the recovery code path - Fix `totp_status` reporting `enrolled: true` after revocation by treating empty vault secret as absent - Block batch_resolve endpoint when TOTP is enforced — each approval must be individually verified - Use per-sender identity for TOTP failure tracking in channel bridge instead of shared "channel_user" counter - Track lockout start time from when failure threshold is reached, not from first failure * fix(approval): allow batch reject when TOTP enforced, deduplicate recovery code check * fix(approval): add lockout protection to setup/confirm/revoke endpoints and recovery code paths * fix(approval): three TOTP review fixes - Bug 1: change DELETE /api/approvals/totp to POST /api/approvals/totp/revoke; DELETE with a JSON body is stripped by many HTTP clients and proxies, making revocation impossible without direct curl invocation. - Bug 2: persist TOTP lockout state to SQLite (migration v18 adds totp_lockout table); load it back on ApprovalManager::new_with_db so a daemon restart no longer resets the 5-consecutive-failure lockout, closing the trivial bypass of killing and restarting the daemon. - Bug 3: plumb the configured totp_issuer through all verify_totp_code call sites (new verify_totp_code_with_issuer helper) so the issuer label in the TOTP struct is consistent between enrollment and verification. * fix(approval): fix missed verify_totp_code call in channel_bridge channel_bridge.rs had one remaining call to verify_totp_code (without issuer) after the system.rs fixes. Update to verify_totp_code_with_issuer to pass the configured totp_issuer consistently. * fix(runtime): rustfmt formatting in trace_store.rs * fix(fmt): rustfmt formatting in plugins.rs, channel_bridge.rs, system.rs, approval.rs - plugins.rs: fix all remaining > 100 char lines: split match arms, expand tuple literals in hook_data, expand make_stat() calls, collapse single-line format! calls (index_url/api_url), collapse single-expression filter closure, split json! macro contents that exceed max_width, fix plugin_env chain indent - channel_bridge.rs: collapse let totp_issuer to single line, split overlong verify_recovery_code/verify_totp_code_with_issuer call args to separate lines, split self.kernel.vault_set chain - system.rs: split generate_totp_secret and verify_totp_code_with_issuer args - approval.rs: split load_totp_lockout function signature * fix(fmt): run cargo fmt --all to fix all rustfmt violations * fix(runtime): fix uuid::Uuid::as_str() and missing Wasm match arm from main merge * fix(approval): three TOTP logic bugs - Bug 1 (channel_bridge): lockout check and TOTP gate now use policy.tool_requires_totp() instead of requires_totp(), so tools not in totp_tools are never blocked by a lockout triggered on other tools - Bug 2 (channel_bridge): /list help text now checks per-tool TOTP requirement; only shows '<totp-code>' hint when a pending tool actually requires it - Bug 4 (approval): resolve() now reads the policy RwLock once per logical phase (gate check, then grace recording) instead of holding a stale snapshot across both, eliminating the hot-reload race * fix(kernel): add missing ContextEngineConfig fields from main merge * fix(approval): apply per-tool TOTP check in API path (system.rs) - Single approve: use tool_requires_totp() instead of requires_totp() so tools not in totp_tools skip TOTP gate and lockout check - Batch approve: only block the batch when a tool in the batch actually requires TOTP, not when second_factor = totp is globally enabled --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
) * feat(hands): proper resource composition — base templates, capability merge, workflow tool, per-agent skills Hand agents can now properly compose registry resources: ## Agent template reuse (base field) - HandAgentManifest supports base = "coder" to inherit from agents/{name}/agent.toml - Deep-merge: hand-level fields override base, unset fields fall through - Resolved at parse time via parse_hand_toml_with_agents_dir() ## Capability merge (was: overwrite) - capabilities.tools: hand-level overrides (unchanged) - capabilities.{network,shell,memory_*,agent_message}: preserved from agent (was lost) - skills/mcp_servers/allowed_plugins: intersect with hand allowlist - tags: append (was: replace) - exec_policy: respect agent-level (was: hardcode Full) ## workflow_run built-in tool - Agents can invoke registered workflows (bug-triage, code-review, etc.) - KernelHandle::run_workflow() with UUID or name lookup ## Per-agent SKILL.md - SKILL-{role}.md files discovered and injected per-agent - Falls back to shared SKILL.md ## Other - Hand-level allowed_plugins field - autonomous.schedule respects agent config (was: hardcoded 60s) - Activation failure rolls back already-spawned agents * fix(kernel): override api_key_env when provider is default When provider=default, api_key_env inherited from a base template would point to the wrong API key (e.g. GEMINI_API_KEY when user configured OpenAI). Now unconditionally uses kernel default. * test(hands): add deep merge and name override tests - deep_merge_preserves_base_fields_and_overrides_hand_fields: verifies table merge (base preserved) and scalar override (hand wins) - base_template_name_override: verifies hand can override name/description/prompt from base template while inheriting module and other unspecified fields * fix(runtime): dereference kernel in workflow_run dispatch after ToolExecContext refactor * fix(hands): path traversal, missing agents_dir, and duplicated skill scan - Validate `base` template name in parse_multi_agent_entry to reject path traversal attempts (e.g. `../../etc`) - Add `home_dir` param to install_from_path so it derives agents_dir and resolves `base` templates correctly - Thread agents_dir through wrapped format fallback by extracting [hand] sub-table and passing it to parse_hand_definition - Extract scan_agent_skill_files helper to deduplicate SKILL-{role}.md scanning in scan_hands_dir and install_from_path - Log warning on rollback deactivate failure instead of silently swallowing the error * docs(hands): document install_from_content limitation on base template resolution * docs(kernel): clarify skills merge semantics for hand activation * fix(hands): normalize flat base templates before merge, reject base in [agent] format * fix(hands): case-insensitive skill lookup, reject base in content install - Lowercase role name when looking up per-agent skill content to match the lowercased keys from SKILL-{role}.md scan - Reject `base` template references in install_from_content() instead of silently ignoring them - Extract resolve_agents_dir() helper to deduplicate path resolution * refactor(hands): deduplicate HandDefinition build logic, harden workflow_run input - Extract build_hand_from_raw() to eliminate duplicated agent construction between Deserialize impl and parse_hand_definition - Validate workflow_run input is object/null, reject other types - Add install_from_content tests for base template rejection --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…g#2135) * refactor(runtime): extract agent loop helpers * fix(runtime): harden agent tool interrupts * fix(runtime): preserve trimmed turn boundaries * fix(runtime): satisfy agent loop clippy checks * refactor(runtime): simplify agent loop helper contexts * fix(runtime): interpolate warning message in recall_or_default --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(api,dashboard): pass timezone through schedule create/update flow The schedule modal now auto-detects the browser timezone and lets users pick from common IANA timezones. The selected timezone is sent as `tz` in the create/update API payload and stored in CronSchedule::Cron.tz (which already existed but was always set to None). Backend: create_schedule and update_schedule read tz from JSON payload, cron_job_to_schedule_json includes tz in the response. Frontend: ScheduleModal has timezone picker with auto-detection via Intl.DateTimeFormat, SchedulerPage shows timezone label next to cron expression, all callers (WorkflowsPage, CanvasPage) pass tz through. The existing next_run display already uses toLocaleString() which converts UTC to browser-local time automatically. Fixes librefang#2137 * fix(dashboard): fix timezone modal bugs found during review - Restore i18n destructuring in ScheduleModal (isZh was undefined, breaking Chinese locale detection for describeCron and labels) - Pass initialTz prop to ScheduleModal in SchedulerPage so re-opening the picker preserves the previously selected timezone - Add tz field to ScheduleItem TypeScript interface, remove unsafe (s as any).tz cast - Dynamically include browser-detected timezone in picker options when it's not in the COMMON_TIMEZONES list * fix(dashboard): add tz to updateSchedule payload and WorkflowsPage initialTz - updateSchedule TypeScript function now accepts optional tz parameter, allowing timezone changes when editing existing schedules - WorkflowsPage passes initialTz from existing schedule to ScheduleModal so the picker shows the saved timezone instead of resetting to browser default * fix(api): preserve timezone when updating cron expression The update_schedule handler replaced the entire CronSchedule object when the cron expression changed. If the caller didn't also send tz in the same request, the timezone was silently reset to null. Now: - When cron is updated without tz: preserve the existing tz from the job - When only tz is sent (no cron change): update timezone in-place by reading the current cron expression from the job - When both cron and tz are sent: use both as provided * fix(dashboard): show selected timezone in cron preview button The create-schedule form showed "09:00" without indicating which timezone it referred to. Now shows "09:00 (Rome)" and "0 9 * * * · Europe/Rome" in the cron picker button so the user knows the timezone before submitting. * fix(api): validate timezone string on create/update schedule Return 400 Bad Request with an actionable message when the caller sends an invalid IANA timezone string, instead of silently storing it and having compute_next_run fall back to UTC at runtime. Validates using chrono_tz::Tz::from_str in both create_schedule and update_schedule. "UTC" is accepted without parsing (it's the default). --------- Co-authored-by: Federico Liva <federico.liva@espero.it> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…ec_policy (librefang#2148) * fix(kernel): glob-match declared tools and auto-promote shell_exec exec_policy Two bugs prevented MCP wildcard tools and shell_exec from working in user-defined agents: 1. available_tools() used exact string equality (d == &t.name) when filtering builtins, skill tools, and MCP tools against the agent's declared capabilities.tools list. Patterns like 'mcp_filesystem_*' or 'file_*' were never matched, so those tools were silently dropped from the list sent to the LLM — making them unreachable even though the MCP server was connected and reporting 14 tools. Fix: replace == with glob_matches() at all three filter sites. 2. When an agent declared shell_exec in capabilities.tools but had no explicit exec_policy, it inherited the global ExecPolicy whose default mode is Deny. This caused shell_exec to be stripped from available_tools() (step 6), so execute_tool() returned 'Permission denied: agent does not have capability' before ever reaching the exec-policy enforcement layer. Fix: at spawn and restore time, if the agent declares shell_exec (or '*') in tools and has no explicit exec_policy, promote exec_policy mode to Full instead of blindly inheriting the global Deny default. * test(kernel): add regression tests for glob tool matching and shell_exec exec_policy - test_available_tools_glob_pattern_matches_mcp_tools: verifies that patterns like 'file_*' in capabilities.tools correctly match builtin tools (file_read, file_write, file_list) and exclude non-matching ones - test_shell_exec_available_when_declared_in_tools_without_explicit_exec_policy: verifies that an agent declaring shell_exec in tools with no explicit exec_policy gets auto-promoted to Full mode and has shell_exec present in available_tools() * style: apply cargo fmt --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…shells (librefang#2166) The previous detection logic failed in two common scenarios: 1. Daemon started outside a login shell (e.g. as a service or via an IDE) where /opt/homebrew/bin is absent from PATH. `claude --version` would fail silently and detect() would return None. 2. Claude Code installed with newer versions that use the system keychain for authentication instead of writing credentials to disk. Neither ~/.claude/.credentials.json nor ~/.claude/credentials.json exist in this case, so the fallback claude_credentials_exist() check also returned false. Fix detect() to try well-known absolute install paths (/opt/homebrew/bin, /usr/local/bin, /usr/bin) when `claude` is not found on PATH. Fix claude_credentials_exist() to also check for ~/.claude/settings.json, which is written by all Claude Code versions regardless of auth mechanism. Adds unit tests for both changes. Co-authored-by: Matt <git@mattevans.cloud> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…card (librefang#2170) The Active Agents stats card was displaying the total agent/template count as the primary big number and the running count as a small sub-value. Swap them so active count is prominent and total is shown as context. Add overview.total translation key to en.json and zh.json.
…rs (librefang#2171) * fix(skills): handle SkillHub search response format with proper headers Override SkillhubClient::search() to send Accept: application/json header (preventing the API from returning HTML), and add fallback parsing for SkillHub-native response format (snake_case, 'skills' key) in addition to the ClawHub-compatible format (camelCase, 'results' key). * fix(skills): use configured base URL and fix parse order in SkillHub search - Store base_url in SkillhubClient instead of hardcoding DEFAULT_SKILLHUB_URL - Try SkillHub-native format first (snake_case) to avoid ClawHubSearchResponse silently accepting any JSON as empty results due to serde defaults - Fix percent_encode doc comment (x-www-form-urlencoded, not RFC 3986)
…#2180) * style: fix rustfmt violations in system.rs, claude_code.rs, skillhub.rs Three files introduced formatting violations in recent commits that rustfmt checks on CI. No logic changes. - system.rs:1668 — expand matches! macro across lines (line too long) - claude_code.rs:1147 — collapse assert! to single line - skillhub.rs:499 — expand assert_eq! across lines (line too long) * fix: resolve clippy warnings and test compile errors from df9072d/a42b92e4 Pre-existing CI failures introduced by recent main commits, not by any open PR. Fixes Quality (clippy) and Test (compile) checks. Clippy: - context_engine.rs:718 map_or(false, ..) → is_some_and(..) - context_engine.rs:1452 for+if-let → .iter().copied().flatten() - plugin_manager.rs:2897 len() >= 1 → !is_empty() - plugin_runtime.rs:1290 remove useless ExitStatus::into() Test compile errors: - context_engine.rs test: run_hook now returns (Value, u64); add correlation_id and output_schema_strict args; destructure return - plugin_manager.rs test: ContextEngineHooks and PluginManifest have new required fields; add ..Default::default() - claude_code.rs test: prefix unused parent/dir_name with _
…#2176) Add CREATE_NO_WINDOW (0x0800_0000) to all child-process spawns so Windows does not briefly pop up a console window for each subprocess. Sites fixed: - librefang-cli: daemon launch (add to existing DETACHED_PROCESS flags) - librefang-api/routes/config: tasklist memory probe - librefang-channels/sidecar: sidecar adapter spawn - librefang-runtime/catalog_sync: git pull/clone at startup - librefang-runtime/host_functions: shell_exec tool - librefang-runtime/process_manager: managed child-process spawn - librefang-runtime/plugin_runtime: probe_launcher_version (python/node/...) - librefang-runtime/python_runtime: find_python_interpreter probes Also fix three compile errors introduced in df9072d (full context engine hooks): - context_engine: AgentId.0 is Uuid — use to_string() not as_str() - plugin_manager: hook_templates missing Wasm arm - kernel: ContextEngineConfig missing output_schema_strict/max_hook_calls_per_minute
…g#2177) When a HAND activates sub-agents, source_toml_path is set to the hand.toml file (HandDefinition format), not to an individual agent.toml (AgentManifest format). Two code paths failed to handle this: 1. Startup scan (boot_restore): tried to parse hand.toml as AgentManifest, emitted spurious "Invalid agent TOML" warnings, and skipped sync. 2. reload_agent_from_disk: returned an error making hand agent reload impossible via the API/dashboard. Fix: add extract_manifest_from_hand_toml() helper that parses the file as HandDefinition and finds the matching agent manifest by name or role (also trying the "{hand_id}-{role}" convention). Both code paths now fall back to this helper before reporting an error. Also fix three compile errors introduced in df9072d: - context_engine: AgentId.0 is Uuid, use to_string() not as_str() - plugin_manager: hook_templates match missing Wasm arm - kernel: ContextEngineConfig missing new fields, use ..Default::default()
…ort (librefang#2144) (librefang#2178) When a tool call fails because the LLM sent empty or missing parameters (e.g. file_list with {} instead of {"path": "."}), the agent loop now: 1. Detects the error as a parameter error via is_parameter_error_content() matching patterns like "Missing 'X' parameter", "required parameter", etc. 2. Injects a retry instruction: "[System: N tool call(s) failed due to missing or invalid parameters. Read the error message, correct your tool call arguments, and retry immediately. Do NOT ask the user for help — fix the parameters yourself.]" instead of the generic "report error to user" message. 3. Treats parameter errors as soft errors so they do NOT count toward consecutive_all_failed, preventing premature loop termination. Also improves the file_list error message to include an explicit retry hint: "Missing 'path' parameter — retry with {"path": "."} to list workspace root" Both the non-streaming (run_agent_loop) and streaming (run_agent_loop_streaming) paths are updated. Also fixes three compile errors from df9072d (same as PRs librefang#2176, librefang#2177).
…#2181) Co-authored-by: zhongxiong <zhongxiong@neuramatrix.net>
…llowed_users (librefang#2183) * feat: add extra_params support for openai compatible model Co-authored-by: zhongxiong <zhongxiong@neuramatrix.net> * fix: multi-bot Telegram routing uses account_id, not first-match on allowed_users resolve_with_context only looked up the generic "Telegram" key in channel_defaults, but keys are stored as "Telegram:<account_id>" when multiple bots are configured. A user present in both bots' allowed_users always matched the first-registered bot regardless of which token received the message. Fix: before falling back to the generic channel key, probe the account-specific key "Telegram:<account_id>" when ctx.account_id is present. Adds a regression test covering the exact scenario from librefang#2140. Fixes librefang#2140 --------- Co-authored-by: zhongxiong <zhongxiong@neuramatrix.net>
- Fix Uuid::as_str() → to_string() in context_engine rate limiter - Add missing PluginRuntime::Wasm arm in hook_templates match - Add missing ContextEngineConfig fields (output_schema_strict, max_hook_calls_per_minute) - Add missing PluginManifest fields in test structs (env, integrity, etc.) - Fix run_hook test to match new 17-arg signature and tuple return type - Fix clippy: map_or → is_some_and, len >= 1 → !is_empty, remove useless .into() Co-authored-by: fnavarrp <fnavarrp@emeal.nttdata.com>
There was a problem hiding this comment.
Sorry @leszek3737, your pull request is larger than the review limit of 500000 diff characters
There was a problem hiding this comment.
Code Review
This pull request introduces significant security and functionality enhancements, including TOTP-based second-factor authentication for tool approvals, filesystem and network sandboxing for hook subprocesses using Landlock and seccomp, and support for plugin stacking. I have reviewed the implementation details, particularly the use of unsafe blocks for security sandboxing and the new registry validation logic. The feedback provided addresses potential safety concerns regarding async-signal-safety in pre_exec blocks, the integration of registry validation, and cross-platform compatibility for resource monitoring.
| // Apply Landlock filesystem restriction in the child process before exec. | ||
| // This is done via unsafe pre_exec which runs in the forked child after fork() | ||
| // but before exec(), so it restricts only the child's filesystem access. | ||
| #[cfg(all(target_os = "linux", feature = "landlock-sandbox"))] | ||
| if !config.allow_filesystem { | ||
| use std::os::unix::process::CommandExt; | ||
| let write_dir = _hook_tmpdir.clone(); | ||
| // SAFETY: pre_exec runs after fork() in the child. We only call | ||
| // try_apply_landlock_readonly which uses only async-signal-safe-equivalent | ||
| // operations (syscalls via the landlock crate). | ||
| unsafe { | ||
| cmd.pre_exec(move || { | ||
| try_apply_landlock_readonly(write_dir.as_deref()); | ||
| Ok(()) | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
The use of unsafe in pre_exec to apply Landlock restrictions is correct for post-fork, but ensure that try_apply_landlock_readonly does not perform any non-async-signal-safe operations, such as memory allocation or mutex locking, which could lead to deadlocks or undefined behavior in the child process.
| /// Validate a GitHub registry identifier supplied by a caller. | ||
| /// | ||
| /// Accepts the form `owner/repo` where each component is alphanumeric plus | ||
| /// hyphens, underscores, and dots (matching GitHub's naming rules). Rejects | ||
| /// anything that could be used to manipulate the URL constructed later: | ||
| /// empty strings, extra slashes, `..`, or non-ASCII characters. | ||
| fn validate_registry_param(registry: &str) -> Result<(), String> { | ||
| // Must be exactly `owner/repo` — one slash, no leading/trailing slash. | ||
| let parts: Vec<&str> = registry.splitn(2, '/').collect(); | ||
| if parts.len() != 2 { | ||
| return Err(format!( | ||
| "Invalid registry '{registry}': expected 'owner/repo' format" | ||
| )); | ||
| } | ||
| let is_safe_component = |s: &str| -> bool { | ||
| !s.is_empty() | ||
| && s.len() <= 100 | ||
| && s.chars() | ||
| .all(|c| c.is_ascii_alphanumeric() || c == '-' || c == '_' || c == '.') | ||
| }; | ||
| if !is_safe_component(parts[0]) || !is_safe_component(parts[1]) { | ||
| return Err(format!( | ||
| "Invalid registry '{registry}': components must be non-empty ASCII \ | ||
| alphanumeric/hyphen/underscore/dot, max 100 chars each" | ||
| )); | ||
| } | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Type
Summary
Changes
Attribution
Co-authored-by, commit preservation, or explicit credit in the PR body)Testing
cargo clippy --workspace --all-targets -- -D warningspassescargo test --workspacepassesSecurity