Skip to content

Conversation

raiden-staging
Copy link
Contributor

@raiden-staging raiden-staging commented Aug 16, 2025

  • Extending server with /input routes
# Input features ---
# Mouse

# | POST /input/mouse/move — Move mouse to absolute coordinates
curl -X POST -H "Content-Type: application/json" \
  --data '{"x": 500, "y": 500}' \
  http://localhost:10001/input/mouse/move
# Response: {"ok":true}

# | POST /input/mouse/move_relative — Move mouse relative to current position
curl -X POST -H "Content-Type: application/json" \
  --data '{"dx": 50, "dy": -25}' \
  http://localhost:10001/input/mouse/move_relative
# Response: {"ok":true}

# | POST /input/mouse/click — Click mouse button
curl -X POST -H "Content-Type: application/json" \
  --data '{"button":"left","count":2}' \
  http://localhost:10001/input/mouse/click
# Response: {"ok":true}

# | POST /input/mouse/down — Press mouse button down
curl -X POST -H "Content-Type: application/json" \
  --data '{"button":"left"}' \
  http://localhost:10001/input/mouse/down
# Response: {"ok":true}

# | POST /input/mouse/up — Release mouse button
curl -X POST -H "Content-Type: application/json" \
  --data '{"button":"left"}' \
  http://localhost:10001/input/mouse/up
# Response: {"ok":true}

# | POST /input/mouse/scroll — Scroll mouse wheel
curl -X POST -H "Content-Type: application/json" \
  --data '{"dx":0,"dy":-120}' \
  http://localhost:10001/input/mouse/scroll
# Response: {"ok":true}

# | GET /input/mouse/location — Get current mouse location
curl http://localhost:10001/input/mouse/location
# Response: {"x":500,"y":500,"screen":0,"window":"60817493"}


# Keyboard

# | POST /input/keyboard/type — Type text
curl -X POST -H "Content-Type: application/json" \
  --data '{"text":"Hello, World!","wpm":300,"enter":true}' \
  http://localhost:10001/input/keyboard/type
# Response: {"ok":true}

# | POST /input/keyboard/key — Send key sequence
curl -X POST -H "Content-Type: application/json" \
  --data '{"keys":["ctrl","a"]}' \
  http://localhost:10001/input/keyboard/key
# Response: {"ok":true}

# | POST /input/keyboard/key_down — Press and hold a key
curl -X POST -H "Content-Type: application/json" \
  --data '{"key":"ctrl"}' \
  http://localhost:10001/input/keyboard/key_down
# Response: {"ok":true}

# | POST /input/keyboard/key_up — Release a key
curl -X POST -H "Content-Type: application/json" \
  --data '{"key":"ctrl"}' \
  http://localhost:10001/input/keyboard/key_up
# Response: {"ok":true}


# Window

# | POST /input/window/activate — Activate a window by match
curl -X POST -H "Content-Type: application/json" \
  --data '{"match":{"title_contains":"New Tab","only_visible":true}}' \
  http://localhost:10001/input/window/activate
# Response: {"activated":true,"wid":"60817493"}

# | POST /input/window/focus — Focus a window by match
curl -X POST -H "Content-Type: application/json" \
  --data '{"match":{"class":"Google-chrome"}}' \
  http://localhost:10001/input/window/focus
# Response: {"focused":true,"wid":"60817493"}

# | POST /input/window/move_resize — Move/resize a window by match
curl -X POST -H "Content-Type: application/json" \
  --data '{"match":{"title_contains":"Chrome"},"x":0,"y":0,"width":1280,"height":720}' \
  http://localhost:10001/input/window/move_resize
# Response: {"ok":true}

# | POST /input/window/raise — Raise window to top
curl -X POST -H "Content-Type: application/json" \
  --data '{"match":{"pid":12345}}' \
  http://localhost:10001/input/window/raise
# Response: {"ok":true}

# | POST /input/window/minimize — Minimize window
curl -X POST -H "Content-Type: application/json" \
  --data '{"match":{"title_contains":"New Tab"}}' \
  http://localhost:10001/input/window/minimize
# Response: {"ok":true}

# | POST /input/window/map — Show window
curl -X POST -H "Content-Type: application/json" \
  --data '{"match":{"title_contains":"New Tab"}}' \
  http://localhost:10001/input/window/map
# Response: {"ok":true}

# | POST /input/window/unmap — Hide window
curl -X POST -H "Content-Type: application/json" \
  --data '{"match":{"title_contains":"New Tab"}}' \
  http://localhost:10001/input/window/unmap
# Response: {"ok":true}

# | POST /input/window/close — Close window by match
curl -X POST -H "Content-Type: application/json" \
  --data '{"match":{"title_contains":"New Tab"}}' \
  http://localhost:10001/input/window/close
# Response: {"ok":true,"wid":"60817493","windowIds":["60817493"]}

# | POST /input/window/kill — Force-kill window by match
curl -X POST -H "Content-Type: application/json" \
  --data '{"match":{"title_contains":"Unresponsive App"}}' \
  http://localhost:10001/input/window/kill
# Response: {"ok":true}

# | GET /input/window/active — Get active window
curl http://localhost:10001/input/window/active
# Response: {"wid":"60817493"}

# | GET /input/window/focused — Get focused window
curl http://localhost:10001/input/window/focused
# Response: {"wid":"60817493"}

# | POST /input/window/name — Get window name
curl -X POST -H "Content-Type: application/json" \
  --data '{"wid":"60817493"}' \
  http://localhost:10001/input/window/name
# Response: {"wid":"60817493","name":"New Tab - Google Chrome"}

# | POST /input/window/pid — Get window PID
curl -X POST -H "Content-Type: application/json" \
  --data '{"wid":"60817493"}' \
  http://localhost:10001/input/window/pid
# Response: {"wid":"60817493","pid":42420}

# | POST /input/window/geometry — Get window geometry
curl -X POST -H "Content-Type: application/json" \
  --data '{"wid":"60817493"}' \
  http://localhost:10001/input/window/geometry
# Response: {"wid":"60817493","x":0,"y":0,"width":1280,"height":720,"screen":0}


# Display

# | GET /input/display/geometry — Get display geometry
curl http://localhost:10001/input/display/geometry
# Response: {"width":1536,"height":776}

TL;DR

Extended the server with a new, dedicated set of /input API routes for programmatic control over mouse, keyboard, and window interactions.

Why we made these changes

To introduce comprehensive remote control and automation capabilities by providing a specialized API for interacting with the desktop environment's input devices and windows. This refactors the server to focus specifically on these functionalities.

What changed?

  • New API Endpoints: Added /input/mouse/*, /input/keyboard/*, /input/window/*, and /input/display/* routes for mouse movement, clicks, keyboard typing, key presses, and extensive window management (activate, focus, move, resize, close, query information).
  • Server Specialization: Refactored main.go to transform the general API server into a dedicated 'input API' server, removing previous OpenAPI specification serving and FFmpeg dependencies.
  • New Codebase: Introduced several new Go files (input.go, input_api.go, input_handlers.go, router.go, main_input.go) to define data structures, implement API logic, set up HTTP handlers (often leveraging xdotool), and configure the new server's entry point.
  • Documentation: Updated server/README.md with detailed examples and curl commands for all new /input endpoints.
  • Testing: Added input_test.go with unit tests for various input handlers.
  • Health Check: Included a new /health endpoint to report service status and uptime.

Description generated by Mesa. Update settings

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of fe02e69...686514f

13 files reviewed | 0 comments | Review on Mesa | Edit Reviewer Settings

cmd := exec.Command("xdotool", "--version")
if err := cmd.Run(); err != nil {
panic(fmt.Errorf("ffmpeg not found or not executable: %w", err))
panic(fmt.Errorf("xdotool not found or not executable: %w", err))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: API Endpoints Missing After Update

The server's core recording API endpoints (e.g., /recording/start, /recording/stop) and OpenAPI spec serving (/spec.yaml, /spec.json) are no longer registered. This removes significant existing functionality, as the server now only exposes input API handlers.

Fix in Cursor Fix in Web

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of fe02e69...686514f

13 files reviewed | 0 comments | Review on Mesa | Edit Reviewer Settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant