AsyncComputer API Reference

💡 Sync Version: This documentation covers the asynchronous API. For synchronous operations, see Computer.

⚡ Performance Advantage: Async API enables concurrent operations with 4-6x performance improvements for parallel tasks.

🖥️ Related Tutorial

Computer Use Guide - Automate desktop applications

Overview

The Computer module provides comprehensive desktop automation capabilities including mouse operations,keyboard input, screen capture, and window management. It enables automated UI testing and RPA workflows.

Requirements

Requires windows_latest image for computer use features

Data Types

MouseButton

Mouse button constants: Left, Right, Middle, DoubleLeft

ScrollDirection

Scroll direction constants: Up, Down, Left, Right

KeyModifier

Keyboard modifier keys: Ctrl, Alt, Shift, Win

Important Notes

Key names in PressKeys and ReleaseKeys are case-sensitive
Coordinate validation: x and y must be non-negative integers
Drag operation requires valid start and end coordinates
Screenshot operations may have size limitations

Computer module for desktop UI automation. Handles mouse operations, keyboard operations, window management, application management, and screen operations.

AsyncComputer

class AsyncComputer(AsyncBaseService)

Handles computer UI automation operations in the AgentBay cloud environment. Provides comprehensive desktop automation capabilities including mouse, keyboard, window management, application management, and screen operations.

init

def __init__(self, session)

Initialize a Computer object.

Arguments:

session: The session object that provides access to the AgentBay API.

click_mouse

async def click_mouse(
        x: int,
        y: int,
        button: Union[MouseButton, str] = MouseButton.LEFT) -> BoolResult

Clicks the mouse at the specified screen coordinates.

Arguments:

x int - X coordinate in pixels (0 is left edge of screen).
y int - Y coordinate in pixels (0 is top edge of screen).
button Union[MouseButton, str], optional - Mouse button to click. Options:
- MouseButton.LEFT or "left": Single left click
- MouseButton.RIGHT or "right": Right click (context menu)
- MouseButton.MIDDLE or "middle": Middle click (scroll wheel)
- MouseButton.DOUBLE_LEFT or "double_left": Double left click Defaults to MouseButton.LEFT.

Returns:

BoolResult: Object containing:

success (bool): Whether the click succeeded
data (bool): True if successful, None otherwise
error_message (str): Error description if failed

Raises:

ValueError: If button is not one of the valid options.

Behavior:

Clicks at the exact pixel coordinates provided
Does not move the mouse cursor before clicking
For double-click, use MouseButton.DOUBLE_LEFT
Right-click typically opens context menus

Example:

session = await agent_bay.create().session
await session.computer.click_mouse(100, 200)
await session.computer.click_mouse(300, 400, MouseButton.RIGHT)
await session.delete()

Notes:

Coordinates are absolute screen positions, not relative to windows
Use get_screen_size() to determine valid coordinate ranges
Consider using move_mouse() first if you need to see cursor movement
For UI automation, consider using higher-level methods from ui module

See Also:

move_mouse, drag_mouse, get_cursor_position, get_screen_size

move_mouse

async def move_mouse(x: int, y: int) -> BoolResult

Moves the mouse to the specified coordinates.

Arguments:

x int - X coordinate.
y int - Y coordinate.

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
await session.computer.move_mouse(500, 300)
position = await session.computer.get_cursor_position()
print(f"Cursor at: {position.data}")
await session.delete()

Notes:

Moves the cursor smoothly to the target position
Does not click after moving
Use get_cursor_position() to verify the new position

See Also:

click_mouse, drag_mouse, get_cursor_position

drag_mouse

async def drag_mouse(
        from_x: int,
        from_y: int,
        to_x: int,
        to_y: int,
        button: Union[MouseButton, str] = MouseButton.LEFT) -> BoolResult

Drags the mouse from one point to another.

Arguments:

from_x int - Starting X coordinate.
from_y int - Starting Y coordinate.
to_x int - Ending X coordinate.
to_y int - Ending Y coordinate.
button Union[MouseButton, str], optional - Button type. Can be MouseButton enum or string. Valid values: MouseButton.LEFT, MouseButton.RIGHT, MouseButton.MIDDLE or their string equivalents. Defaults to MouseButton.LEFT. Note: DOUBLE_LEFT is not supported for drag operations.

Returns:

BoolResult: Result object containing success status and error message if any.

Raises:

ValueError: If button is not a valid option.

Example:

session = await agent_bay.create().session
await session.computer.drag_mouse(100, 100, 300, 300)
await session.computer.drag_mouse(200, 200, 400, 400, MouseButton.RIGHT)
await session.delete()

Notes:

Performs a click-and-drag operation from start to end coordinates
Useful for selecting text, moving windows, or drawing
DOUBLE_LEFT button is not supported for drag operations
Use LEFT, RIGHT, or MIDDLE button only

See Also:

click_mouse, move_mouse

scroll

async def scroll(x: int,
                 y: int,
                 direction: Union[ScrollDirection, str] = ScrollDirection.UP,
                 amount: int = 1) -> BoolResult

Scrolls the mouse wheel at the specified coordinates.

Arguments:

x int - X coordinate.
y int - Y coordinate.
direction Union[ScrollDirection, str], optional - Scroll direction. Can be ScrollDirection enum or string. Valid values: ScrollDirection.UP, ScrollDirection.DOWN, ScrollDirection.LEFT, ScrollDirection.RIGHT or their string equivalents. Defaults to ScrollDirection.UP.
amount int, optional - Scroll amount. Defaults to 1.

Returns:

BoolResult: Result object containing success status and error message if any.

Raises:

ValueError: If direction is not a valid option.

Example:

session = await agent_bay.create().session
await session.computer.scroll(500, 500, ScrollDirection.DOWN, 3)
await session.computer.scroll(500, 500, ScrollDirection.UP, 2)
await session.delete()

Notes:

Scroll operations are performed at the specified coordinates
The amount parameter controls how many scroll units to move
Larger amounts result in faster scrolling
Useful for navigating long documents or web pages

See Also:

click_mouse, move_mouse

get_cursor_position

async def get_cursor_position() -> OperationResult

Gets the current cursor position.

Returns:

OperationResult: Result object containing cursor position data

with keys 'x' and 'y', and error message if any.

Example:

session = await agent_bay.create().session
await session.computer.move_mouse(800, 600)
position = await session.computer.get_cursor_position()
print(f"Cursor is at x={position.data['x']}, y={position.data['y']}")
await session.delete()

Notes:

Returns the absolute screen coordinates
Useful for verifying mouse movements
Position is in pixels from top-left corner (0, 0)

See Also:

move_mouse, click_mouse, get_screen_size

input_text

async def input_text(text: str) -> BoolResult

Types text into the currently focused input field.

Arguments:

text str - The text to input. Supports Unicode characters.

Returns:

BoolResult: Object with success status and error message if any.

Example:

session = await agent_bay.create().session
await session.computer.click_mouse(500, 300)
await session.computer.input_text("Hello, World!")
await session.delete()

Notes:

Requires an input field to be focused first
Use click_mouse() or UI automation to focus the field
Supports special characters and Unicode

See Also:

press_keys, click_mouse

press_keys

async def press_keys(keys: List[str], hold: bool = False) -> BoolResult

Presses the specified keys.

Arguments:

keys List[str] - List of keys to press (e.g., ["Ctrl", "a"]).
hold bool, optional - Whether to hold the keys. Defaults to False.

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
await session.computer.press_keys(["Ctrl", "c"])
await session.computer.press_keys(["Ctrl", "v"])
await session.delete()

Notes:

Key names are case-sensitive
When hold=True, remember to call release_keys() afterwards
Supports modifier keys like Ctrl, Alt, Shift
Can press multiple keys simultaneously for shortcuts

See Also:

release_keys, input_text

release_keys

async def release_keys(keys: List[str]) -> BoolResult

Releases the specified keys.

Arguments:

keys List[str] - List of keys to release (e.g., ["Ctrl", "a"]).

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
await session.computer.press_keys(["Shift"], hold=True)
await session.computer.input_text("hello")
await session.computer.release_keys(["Shift"])
await session.delete()

Notes:

Should be used after press_keys() with hold=True
Key names are case-sensitive
Releases all keys specified in the list

See Also:

press_keys, input_text

get_screen_size

async def get_screen_size() -> OperationResult

Gets the screen size and DPI scaling factor.

Returns:

OperationResult: Result object containing screen size data

with keys 'width', 'height', and 'dpiScalingFactor', and error message if any.

Example:

result = await agent_bay.create()
session = result.session
size = await session.computer.get_screen_size()
print(
  f"Screen: {size.data['width']}x{size.data['height']}, DPI: {size.data['dpiScalingFactor']}"
)
await session.delete()

Notes:

Returns the full screen dimensions in pixels
DPI scaling factor affects coordinate calculations on high-DPI displays
Use this to determine valid coordinate ranges for mouse operations

See Also:

click_mouse, move_mouse, screenshot

screenshot

async def screenshot() -> OperationResult

Takes a screenshot of the current screen.

Returns:

OperationResult: Result object containing the path to the screenshot

and error message if any.

Example:

session = await agent_bay.create().session
screenshot = await session.computer.screenshot()
print(f"Screenshot URL: {screenshot.data}")
await session.delete()

Notes:

Returns an OSS URL to the screenshot image
Screenshot captures the entire screen
Useful for debugging and verification
Image format is typically PNG

See Also:

get_screen_size

beta_take_screenshot

async def beta_take_screenshot(format: str = "png") -> ScreenshotResult

Takes a screenshot of the Computer.

This API uses the MCP tool screenshot (wuying_capture) and returns raw binary image data. The backend also returns the captured image dimensions (width/height in pixels), which are exposed on ScreenshotResult.width and ScreenshotResult.height. The backend metadata fields type and mime_type are exposed on ScreenshotResult.type and ScreenshotResult.mime_type.

Arguments:

format: The desired image format (default: "png"). Supported: "png", "jpeg", "jpg".

Returns:

ScreenshotResult: Object containing the screenshot image data (bytes) and metadata

including type, mime_type, width, and height when provided by the backend.

Raises:

AgentBayError: If screenshot fails or response cannot be decoded.
ValueError: If `format` is invalid.

list_root_windows

async def list_root_windows(timeout_ms: int = 3000) -> WindowListResult

Lists all root windows.

Arguments:

timeout_ms int, optional - Timeout in milliseconds. Defaults to 3000.

Returns:

WindowListResult: Result object containing list of windows and error message if any.

Example:

session = await agent_bay.create().session
windows = await session.computer.list_root_windows()
for window in windows.windows:
  print(f"Window: {window.title}, ID: {window.window_id}")
await session.delete()

get_active_window

async def get_active_window() -> WindowInfoResult

Gets the currently active window.

Returns:

WindowInfoResult: Result object containing active window info and error message if any.

Example:

session = await agent_bay.create().session
active = await session.computer.get_active_window()
print(f"Active window: {active.window.title}")
await session.delete()

activate_window

async def activate_window(window_id: int) -> BoolResult

Activates the specified window.

Arguments:

window_id int - The ID of the window to activate.

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
windows = await session.computer.list_root_windows()
if windows.windows:
  await session.computer.activate_window(windows.windows[0].window_id)
await session.delete()

Notes:

The window must exist in the system
Use list_root_windows() to get available window IDs
Activating a window brings it to the foreground

See Also:

list_root_windows, get_active_window, close_window

close_window

async def close_window(window_id: int) -> BoolResult

Closes the specified window.

Arguments:

window_id int - The ID of the window to close.

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
windows = await session.computer.list_root_windows()
if windows.windows:
  await session.computer.close_window(windows.windows[0].window_id)
await session.delete()

Notes:

The window must exist in the system
Use list_root_windows() to get available window IDs
Closing a window terminates it permanently

See Also:

list_root_windows, activate_window, minimize_window

maximize_window

async def maximize_window(window_id: int) -> BoolResult

Maximizes the specified window.

Arguments:

window_id int - The ID of the window to maximize.

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
active = await session.computer.get_active_window()
if active.window:
  await session.computer.maximize_window(active.window.window_id)
await session.delete()

Notes:

The window must exist in the system
Maximizing expands the window to fill the screen
Use restore_window() to return to previous size

See Also:

minimize_window, restore_window, fullscreen_window, resize_window

minimize_window

async def minimize_window(window_id: int) -> BoolResult

Minimizes the specified window.

Arguments:

window_id int - The ID of the window to minimize.

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
active = await session.computer.get_active_window()
if active.window:
  await session.computer.minimize_window(active.window.window_id)
await session.delete()

Notes:

The window must exist in the system
Minimizing hides the window in the taskbar
Use restore_window() or activate_window() to bring it back

See Also:

maximize_window, restore_window, activate_window

restore_window

async def restore_window(window_id: int) -> BoolResult

Restores the specified window.

Arguments:

window_id int - The ID of the window to restore.

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
active = await session.computer.get_active_window()
if active.window:
  wid = active.window.window_id
  await session.computer.minimize_window(wid)
  await session.computer.restore_window(wid)
await session.delete()

Notes:

The window must exist in the system
Restoring returns a minimized or maximized window to its normal state
Works for windows that were previously minimized or maximized

See Also:

minimize_window, maximize_window, activate_window

resize_window

async def resize_window(window_id: int, width: int, height: int) -> BoolResult

Resizes the specified window.

Arguments:

window_id int - The ID of the window to resize.
width int - New width of the window.
height int - New height of the window.

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
active = await session.computer.get_active_window()
if active.window:
  await session.computer.resize_window(active.window.window_id, 800, 600)
await session.delete()

Notes:

The window must exist in the system
Width and height are in pixels
Some windows may have minimum or maximum size constraints

See Also:

maximize_window, restore_window, get_screen_size

fullscreen_window

async def fullscreen_window(window_id: int) -> BoolResult

Makes the specified window fullscreen.

Arguments:

window_id int - The ID of the window to make fullscreen.

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
active = await session.computer.get_active_window()
if active.window:
  await session.computer.fullscreen_window(active.window.window_id)
await session.delete()

Notes:

The window must exist in the system
Fullscreen mode hides window borders and taskbar
Different from maximize_window() which keeps window borders
Press F11 or ESC to exit fullscreen in most applications

See Also:

maximize_window, restore_window

focus_mode

async def focus_mode(on: bool) -> BoolResult

Toggles focus mode on or off.

Arguments:

on bool - True to enable focus mode, False to disable it.

Returns:

BoolResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
await session.computer.focus_mode(True)
await session.computer.focus_mode(False)
await session.delete()

Notes:

Focus mode helps reduce distractions by managing window focus
When enabled, may prevent background windows from stealing focus
Behavior depends on the window manager and OS settings

See Also:

activate_window, get_active_window

get_installed_apps

async def get_installed_apps(
        start_menu: bool = True,
        desktop: bool = False,
        ignore_system_apps: bool = True) -> InstalledAppListResult

Gets the list of installed applications.

Arguments:

start_menu bool, optional - Whether to include start menu applications. Defaults to True.
desktop bool, optional - Whether to include desktop applications. Defaults to False.
ignore_system_apps bool, optional - Whether to ignore system applications. Defaults to True.

Returns:

InstalledAppListResult: Result object containing list of installed apps and error message if any.

Example:

session = await agent_bay.create().session
apps = await session.computer.get_installed_apps()
for app in apps.data:
  print(f"{app.name}: {app.start_cmd}")
await session.delete()

Notes:

start_menu parameter includes applications from Windows Start Menu
desktop parameter includes shortcuts from Desktop
ignore_system_apps parameter filters out system applications
Each app object contains name, start_cmd, stop_cmd, and work_directory

See Also:

start_app, list_visible_apps, stop_app_by_pname

start_app

async def start_app(start_cmd: str,
                    work_directory: str = "",
                    activity: str = "") -> ProcessListResult

Starts the specified application.

Arguments:

start_cmd str - The command to start the application.
work_directory str, optional - Working directory for the application. Defaults to "".
activity str, optional - Activity name to launch (for mobile apps). Defaults to "".

Returns:

ProcessListResult: Result object containing list of processes started and error message if any.

Example:

session = await agent_bay.create().session
processes = await session.computer.start_app("notepad.exe")
print(f"Started {len(processes.data)} process(es)")
await session.delete()

Notes:

The start_cmd can be an executable name or full path
work_directory is optional and defaults to the system default
activity parameter is for mobile apps (Android)
Returns process information for all started processes

See Also:

get_installed_apps, stop_app_by_pname, list_visible_apps

list_visible_apps

async def list_visible_apps() -> ProcessListResult

Lists all applications with visible windows.

Returns detailed process information for applications that have visible windows, including process ID, name, command line, and other system information. This is useful for system monitoring and process management tasks.

Returns:

ProcessListResult: Result object containing list of visible applications

with detailed process information.

Example:

session = await agent_bay.create().session
apps = await session.computer.list_visible_apps()
for app in apps.data:
  print(f"App: {app.pname}, PID: {app.pid}")
await session.delete()

Notes:

Only returns applications with visible windows
Hidden or minimized windows may not appear
Useful for monitoring currently active applications
Process information includes PID, name, and command line

See Also:

get_installed_apps, start_app, stop_app_by_pname, stop_app_by_pid

stop_app_by_pname

async def stop_app_by_pname(pname: str) -> AppOperationResult

Stops an application by process name.

Arguments:

pname str - The process name of the application to stop.

Returns:

AppOperationResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
await session.computer.start_app("notepad.exe")
result = await session.computer.stop_app_by_pname("notepad.exe")
await session.delete()

Notes:

The process name should match exactly (case-sensitive on some systems)
This will stop all processes matching the given name
If multiple instances are running, all will be terminated
The .exe extension may be required on Windows

See Also:

start_app, stop_app_by_pid, stop_app_by_cmd, list_visible_apps

stop_app_by_pid

async def stop_app_by_pid(pid: int) -> AppOperationResult

Stops an application by process ID.

Arguments:

pid int - The process ID of the application to stop.

Returns:

AppOperationResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
processes = await session.computer.start_app("notepad.exe")
pid = processes.data[0].pid
result = await session.computer.stop_app_by_pid(pid)
await session.delete()

Notes:

PID must be a valid process ID
More precise than stopping by name (only stops specific process)
The process must be owned by the session or have appropriate permissions
PID can be obtained from start_app() or list_visible_apps()

See Also:

start_app, stop_app_by_pname, stop_app_by_cmd, list_visible_apps

stop_app_by_cmd

async def stop_app_by_cmd(stop_cmd: str) -> AppOperationResult

Stops an application by stop command.

Arguments:

stop_cmd str - The command to stop the application.

Returns:

AppOperationResult: Result object containing success status and error message if any.

Example:

session = await agent_bay.create().session
apps = await session.computer.get_installed_apps()
if apps.data and apps.data[0].stop_cmd:
  result = await session.computer.stop_app_by_cmd(apps.data[0].stop_cmd)
await session.delete()

Notes:

The stop_cmd should be the command registered to stop the application
Typically obtained from get_installed_apps() which returns app metadata
Some applications may not have a stop command defined
The command is executed as-is without shell interpretation

See Also:

get_installed_apps, start_app, stop_app_by_pname, stop_app_by_pid

Best Practices

Verify screen coordinates before mouse operations
Use appropriate delays between UI interactions
Handle window focus changes properly
Take screenshots for verification and debugging
Use keyboard shortcuts for efficient automation
Clean up windows and applications after automation

FilesExpand file tree

async-computer.md

Latest commit

History

async-computer.md

File metadata and controls

AsyncComputer API Reference

🖥️ Related Tutorial

Overview

Requirements

Data Types

MouseButton

ScrollDirection

KeyModifier

Important Notes

AsyncComputer

init

click_mouse

move_mouse

drag_mouse

scroll

get_cursor_position

input_text

press_keys

release_keys

get_screen_size

screenshot

beta_take_screenshot

list_root_windows

get_active_window

activate_window

close_window

maximize_window

minimize_window

restore_window

resize_window

fullscreen_window

focus_mode

get_installed_apps

start_app

list_visible_apps

stop_app_by_pname

stop_app_by_pid

stop_app_by_cmd

Best Practices

See Also