Skip to content

Advanced cross-platform web automation with a convenient Go API

License

Notifications You must be signed in to change notification settings

daabr/chrome-vision

Repository files navigation

Chrome Vision

Go Reference Go Report Card GitHub Workflow Status

Advanced cross-platform web automation with a convenient Go API.

This project is based on the Chrome DevTools Protocol (CDP) and WebDriver specification, to control web sessions in Chrome and other Blink-based browsers.

It improves, consolidates and extends the approaches of Selenium WebDriver, Puppeteer and ChromeDP. The first two are very popular, but only the last one offers integration with Go without heavy non-Go dependencies, albeit with a very limited high-level API.

Goals

  • Make the Chrome DevTools Protocol more accessible
  • Simplify the low-level API even more than ChromeDP
  • Support the WebDriver specification as a higher-level API layer, together with the lower-level CDP API
  • Add Computer Vision (CV) and Optical Character Recognition (OCR) capabilities, which are missing in all of the above

Differences From ChromeDP

  • Simpler session initialization, and simpler customization of browser flags (see example)

  • Simpler CDP command execution API:

    • More idiomatic command execution, instead of limited JavaScript-await-like wrapper functions (chromedp.Run, chromedp.RunResponse, chromedp.ActionFunc):

      • Synchronous/blocking execution (simplest): the command's Do function waits for the browser's response and returns it as a parsed struct (see example)
      • Asynchronous/non-blocking execution (more advanced): the command's Start function returns a Go channel to receive from the browser a raw devtools.Message, and the optional ParseResponse function parses it (see example)
      • Return values: for commands with output, Do and ParseResponse return a single struct, to avoid the clutter and brittleness of many return value assignments
      • Errors: both Go and CDP errors are returned as Go errors by Do and ParseResponse, no need for multiple error checks per command
    • No need to initialize executors, either with chromedp.Run or with cdproto.cdp.WithExecutor

  • Communication with the browser:

    • Non-Windows operating systems: using POSIX pipes instead of WebSockets (faster, more reliable and more secure)
    • Windows: using an internal implementation of the WebSocket protocol (optimized for Chrome DevTools, faster and more efficient - see documentation)
  • Stronger adherence to idiomatic Go coding style and https://github.com/golang-standards/project-layout