Skip to content

Conversation

raiden-staging
Copy link
Contributor

[ @juecd @rgarcia ]

Kernel Computer Operator API (v1) [WIP]

  • new Operator API with extended OS-level controls & tools | see /operator-api
  • node-based, executable bundling via bun, easily extendable now
  • supports current (/server/openapi.yaml) routes to replace the current Go server
  • extensive test suites to streamline adding features

✅ Build process integrated & api test in kernel-images docker container


Tools

  • core filesystem & execution

    • /fs – read/write/list/move/delete files and directories; create/delete dirs; permissions and info; directory watch with event stream; upload/download; tail stream.
    • /process – run commands sync/async; spawn and status; stdout/stderr sse; stdin write; send signals.
    • /bus – publish messages and subscribe via sse channels.
    • /pipe – send json objects and receive streams on named channels.
    • /logs – stream kernel (like, kernel) /syslog/application/path logs via sse.
  • capture & streaming

    • /recording – start/stop/download recordings, list recorders, delete recordings.
    • /screenshot – capture still images of the desktop or a region, retrieve by id.
    • /stream – start/stop rtmps desktop streaming, live metrics via sse.
  • interactive control

    • /clipboard – get/set clipboard; change stream via sse.
    • /input – mouse/keyboard control; window activate/focus/move/resize/snap/center/minimize/map/unmap/kill; desktop/display geometry and switching; combo actions; system exec/sleep; mouse location. xdotool integration.
    • /os – get/set locale, keyboard layout, timezone.
    • /computer – mouse controls (click, move) [for legacy support].
  • automation & monitoring

    • /metrics – one-shot snapshot and continuous sse metrics.
    • /macros – create/run/list/delete macros of input/system steps.
    • /scripts – upload/run/list/delete scripts; async run logs stream.
  • networking & browser

    • /browser – start/stream/stop har capture from the default browser profile. [wip]
    • /network – socks5 proxy control; port forwarding; request/response interception; har entry stream. [wip]

@mesa-dot-dev
Copy link

mesa-dot-dev bot commented Aug 11, 2025

Mesa Description

TL;DR

This PR introduces a new Node.js-based kernel-operator-api to replace the existing Go server, providing extensive OS-level controls and tools within the Chromium headful environment, complete with a comprehensive test suite.

Why we made these changes

The motivation is to replace the existing Go server with a Node.js-based API that offers extended OS-level controls and tools, is easily extendable, and streamlines feature addition through extensive test suites. This aims to improve the overall developer experience and system control.

What changed?

  • New operator-api module: Introduced a complete new Node.js/Bun based API server, replacing the existing Go server.
    • Core API infrastructure: index.js (server entry), app.js (route aggregation), package.json (dependencies), bun.lock, openapi.yaml (API specification), and environment files (.kernel-operator.env, .wipdev.example.env).
    • Extensive OS-level capabilities: Added dedicated API routes and services for:
      • Filesystem & Process: fs.js (read/write/list/move/delete, watch, tail), process.js (run, spawn, status, stdin/out streams).
      • Capture & Streaming: recording.js (start/stop/download), screenshot.js (capture images), stream.js (RTMP/RTMPS streaming).
      • Interactive Control: input.js (mouse/keyboard/window control via xdotool), clipboard.js (get/set/stream), os.js (locale/keyboard/timezone settings).
      • Automation & Monitoring: metrics.js (snapshot/stream), macros.js (create/run xdotool sequences), scripts.js (upload/run/list/delete).
      • Networking & Browser: browser.js (HAR recording), network.js (interception, forwarding, SOCKS5 proxy), bus.js (publish/subscribe event bus), pipe.js (JSON object channels).
    • Shared utilities: New modules for base64 encoding/decoding, environment path management, command execution, ID generation, and Server-Sent Events (SSE).
    • Comprehensive test suite: An extensive set of Vitest-based integration tests covering all major API functionalities, ensuring robustness and consistency.
  • Chromium Headful Docker environment updates:
    • Dockerfile refactor: Significantly improved readability and maintainability using multi-stage builds, enhancing internationalization, user privilege management (including Linux capabilities for new kernel-operator binaries), and updated audio/D-Bus configurations.
    • Build process integration: build-docker.sh and shared/build-operator-api.sh updated to build and include kernel-operator-api binaries within the Docker image.
    • Runtime configuration: New daemon.conf, dbus-mpris.conf, dbus-pulseaudio.conf, default.pa, and xorg.conf files for optimized PulseAudio, D-Bus, and Xorg settings to support the new API and improve audio/display stability.
    • Wrapper script (wrapper.sh) enhancements: Extensively refactored for better setup, stability, and debuggability, including explicit environment variables, service initialization (Xorg, D-Bus, PulseAudio), and starting the kernel-operator-api.
    • Run scripts (run-docker.sh, run-unikernel.sh): Updated to conditionally enable port forwarding and environment variables for the new WITH_KERNEL_OPERATOR_API setup.

cursor[bot]

This comment was marked as outdated.

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of fe02e69...cf89744

75 files reviewed | 11 comments | Review on Mesa | Edit Reviewer Settings

# Grant kernel user + kernel operator api a lot of freedom
###############################################################################
# Passwordless sudo for "kernel" to execute arbitrary root commands when needed
RUN echo 'kernel ALL=(ALL) NOPASSWD:ALL' >/etc/sudoers.d/010-kernel-nopw && chmod 0440 /etc/sudoers.d/010-kernel-nopw
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security consideration: Granting passwordless sudo to the kernel user provides essentially unrestricted root access. While this may be intentional for the operator API's functionality, consider if this broad privilege escalation is necessary for all use cases, or if it could be scoped down to specific commands that require root access.

Type: Security | Severity: Medium

# This preserves the "kernel" user identity but lifts FS/NET/NS limits typical for root tasks
# To be adjusted for required capabilities range.
RUN setcap 'cap_chown,cap_fowner,cap_fsetid,cap_dac_override,cap_dac_read_search,cap_mknod,cap_sys_admin,cap_sys_resource,cap_sys_ptrace,cap_sys_time,cap_sys_tty_config,cap_net_admin,cap_net_raw,cap_setuid,cap_setgid=ep' /usr/local/bin/kernel-operator-api || true && \
setcap 'cap_chown,cap_fowner,cap_fsetid,cap_dac_override,cap_dac_read_search,cap_mknod,cap_sys_admin,cap_sys_resource,cap_sys_ptrace,cap_net_admin,cap_net_raw=ep' /usr/local/bin/kernel-operator-test || true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security consideration: The extensive capabilities granted here (cap_sys_admin, cap_setuid, cap_setgid, cap_net_admin, etc.) provide near-root privileges to the operator binaries. cap_sys_admin in particular is extremely powerful. While this aligns with the PR's goal of "extensive OS-level controls", consider if this capability set could be reduced to only what's actually required by the operator API's functionality to follow the principle of least privilege.

Type: Security | Severity: Medium

docker rm -f "$NAME" 2>/dev/null || true
docker run -it "${RUN_ARGS[@]}" "$IMAGE"
if [[ "${DEBUG_BASH:-false}" == "true" ]]; then
# if DEBUG_BASH set to true, enters container bash
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing container cleanup before creation. The DEBUG_BASH case should include docker rm -f "$NAME" 2>/dev/null || true before docker run -dit to avoid conflicts if a container with the same name already exists, similar to the other cases below.

Type: Logic | Severity: Medium

controller.enqueue(enc.encode(sseFormat({ ts: new Date().toISOString(), ...snapshot() })))
}, 1000)
},
cancel() {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory leak: The setInterval created on line 27 is never cleared when the stream is cancelled. This will cause the interval to continue running indefinitely even after the client disconnects.

Suggested change
cancel() {}
cancel() {
if (iv) clearInterval(iv)
}

You'll also need to declare iv outside the start function so it's accessible in cancel.

Type: Performance | Severity: Medium

@@ -0,0 +1,34 @@
import { Hono } from 'hono'
import os from 'node:os'
import pidusage from 'pidusage'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pidusage import is unused. If you're planning to use it for actual CPU metrics (instead of the hardcoded cpu_pct: 0 on line 11), consider implementing it or removing this unused import.

Type: Style | Severity: Low

@@ -0,0 +1,3 @@
import { customAlphabet } from 'nanoid'
export const rid = customAlphabet('0123456789abcdefghijklmnopqrstuvwxyz', 10)
export const uid = () => rid()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uid function creates an unnecessary wrapper around rid(). Since you want uid to be an alias for rid (as mentioned in the file summary), you can simplify this to avoid the extra function call overhead:

Suggested change
export const uid = () => rid()
export const uid = rid

Type: Performance | Severity: Low


const RUN_ALL = flags.has('--all')
const BASE_URL = 'http://127.0.0.1:10001'
const ALWAYS_DEBUG = true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded ALWAYS_DEBUG = true will cause verbose output for all test runs, which could be noisy. Consider making this configurable via an environment variable or CLI flag:

Suggested change
const ALWAYS_DEBUG = true
const ALWAYS_DEBUG = flags.has('--debug') || process.env.DEBUG_TESTS === 'true'

Type: Style | Severity: Low

import { dirname, join } from 'node:path';
import { existsSync } from 'node:fs';
import { readdir } from 'node:fs/promises';
import chalk from 'chalk'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing semicolon after the chalk import statement. This could cause issues with automatic semicolon insertion in some edge cases.

Type: Style | Severity: Low

console.log(chalk.gray('═'.repeat(70)) + '\n');
}

printAvailableTests()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling printAvailableTests() without await could cause a race condition. Since this is an async function that performs I/O operations, the test list output might appear after the tests have already started running or even completed. Consider using await printAvailableTests() at the top level or restructuring to ensure proper execution order.

Type: Logic | Severity: Medium

"${BUN_IMAGE}" \
bash -lc 'bun i && bun run build'

for f in kernel-operator-api kernel-operator-test .env; do
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The artifact check is looking for .env but the script actually copies .kernel-operator.env (line 44). This mismatch will cause the build to fail when verifying artifacts exist. The loop should check for .kernel-operator.env instead of .env.

Type: Logic | Severity: Medium

cursor[bot]

This comment was marked as outdated.

…xt-remote-install error (chromium --pack-extension=/tmp/extwork../unpacked/my-chrome-ext-main --pack-extension-key=/var/lib/chrome-ext-keys/gh_e7...a.pem exited 1 [ERROR:content/browser/zygote_host/zygote_host...] Running as root without --no-sandbox is not supported. see https://crbug.com/638180) ]
… to browser extension modules ; added audio capture to recording]
…ployment with Unikraft ✅ | Added installing Chromium extensions remotely [operator api: /browser/extension/add/unpacked] 📡 | Kernel-themed loading animation on web client ✿
@rgarcia
Copy link
Contributor

rgarcia commented Aug 11, 2025

wow this is epic! Thanks for this @raiden-staging. General thoughts:

  • Extremely aligned on the direction of adding more computer control. The APIs outlined here are extremely compelling and I want to add them
  • Team is trying to stick with Go for backend stuff--it's what we're most comfortable with and also there might be a future world where we integrate some of Neko into this server

If you're down to try doing these endpoints in Go, let me know and I'll give you more feedback on the APIs. I think breaking this down into separate PRs would make it merge-able more quickly e.g.:

  • native screenshot API. This alone is something we get asked for a lot
  • rtmps streaming--are you thinking this could be a better way to do read-only live view? Or would it be about the same as Neko but w/ the user being able to provide an rtmps url?
  • expanding the privileges of the kernel user and processes we run to basically as close to root user as possible without being named root and triggering chromium's --no-sandbox warning
  • Kernel logo and colors in the live view 😍
  • logging cleanup -- love the logs endpoint but would like to pair this with cleaning up the current setup of multiprocess-via-bash-script into systemd services or supervisord. Would need to maintain the current behavior of the entrypoint echo'ing out all logs and never exiting for the run-docker/run-unikraft setup to still work
  • input APIs for mouse movement etc.

Thanks again for pushing the ball forward with this!

@raiden-staging
Copy link
Contributor Author

distributed PRs + Go port incoming @rgarcia 👍

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of fe02e69...b92ca5e

85 files reviewed | 5 comments | Review on Mesa | Edit Reviewer Settings

_bak/
_wip/
_dev/
bun.lock No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring bun.lock may cause build reproducibility issues. Lock files should typically be committed to ensure consistent dependency versions across different environments. From the PR summary, it appears bun.lock was updated, suggesting it's currently tracked. Consider removing this line unless there's a specific reason to ignore the lock file.

Type: Logic | Severity: Medium

$ref: "#/components/responses/BadRequestError"
"500":
$ref: "#/components/responses/InternalError"
/fs/delete_file:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP method should be delete instead of put for delete operations. Using PUT for /fs/delete_file is semantically incorrect according to REST conventions. DELETE is the appropriate method for resource deletion.

Suggested change
/fs/delete_file:
/fs/delete_file:
delete:
summary: Delete a file
operationId: deleteFile

Type: Style | Severity: Low

type: string
format: binary
description: ZIP archive containing the extension (manifest at root or in first-level dir)
oneOf:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oneOf constraint here may be confusing for API consumers. Consider using a more explicit approach with clear documentation about the mutually exclusive nature of github_url and archive_file, or restructure to use separate endpoints for each input method.

Type: Style | Severity: Low

return
}
if (a.type === 'delay' && a.delay_ms) {
const hold = setTimeout(() => {}, a.delay_ms)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout implementation doesn't actually delay request processing. The setTimeout with an empty function executes asynchronously but doesn't block the subsequent proxy logic. To properly delay the request, you should use:

Suggested change
const hold = setTimeout(() => {}, a.delay_ms)
if (a.type === 'delay' && a.delay_ms) {
await new Promise(resolve => setTimeout(resolve, a.delay_ms))
}

Alternatively, you could wrap the proxy logic in the timeout callback, but the async/await approach would be cleaner. The current code also creates potential memory leaks as the timeout references are never cleared.

Type: Logic | Severity: Medium

"${BUN_IMAGE}" \
bash -lc 'bun i && bun run build'

for f in kernel-operator-api kernel-operator-test .env; do
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: This loop checks for .env but line 44 copies .kernel-operator.env. The artifact names don't match, which will cause the script to fail when it can't find the .env file. Based on the comment on line 17 and the copy command on line 44, this should be .kernel-operator.env:

Suggested change
for f in kernel-operator-api kernel-operator-test .env; do
for f in kernel-operator-api kernel-operator-test .kernel-operator.env; do

Type: Logic | Severity: Medium

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of fe02e69...b92ca5e

Analysis

  1. Severe Security Vulnerabilities - The PR allows arbitrary command execution through /process/exec and /process/spawn endpoints without validation or sandboxing. Path traversal attacks are possible due to insufficient path validation.

  2. Excessive Privilege Elevation - The implementation grants broad Linux capabilities (cap_chown, cap_fowner, cap_sys_admin) and passwordless sudo to the kernel user, creating an unnecessarily permissive security model with a large attack surface.

  3. Inadequate Input Validation - Many endpoints lack proper input validation for user-provided paths, commands, and system parameters, which could lead to injection attacks.

  4. Missing Security Controls - The PR lacks critical security features such as command whitelisting, path canonicalization, proper sandbox restrictions, rate limiting, authentication mechanisms, and audit logging for security-sensitive operations.

Tip

⚡ Quick Actions

This review was generated by Mesa.

Actions:

Slash Commands:

  • /review - Request a full code review
  • /review latest - Review only changes since the last review
  • /describe - Generate PR description. This will update the PR body or issue comment depending on your configuration
  • /help - Get help with Mesa commands and configuration options

85 files reviewed | 0 comments | Review on Mesa | Edit Reviewer Settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants