This document contains a list of libraries and resources for web scraping in Go.
- Libraries
- Popular Web Scraping Stacks
- Guides and Tutorials
Note: All selected libraries are either widely used or actively maintained.
- net/http: A built-in Go package that provides HTTP client and server implementations
- fasthttp: A fast HTTP implementation for Go
- resty: A simple HTTP and REST client library for Go
- req: A simple Go HTTP client with Black Magic
- requests: HTTP requests for Gophers
- heimdall: An enhanced HTTP client for Go
- go-retryablehttp: A retryable HTTP client in Go
- retryablehttp-go: A package that provides a familiar HTTP client interface with automatic retries and exponential backoff
- sling: A Go HTTP client library for creating and sending API requests
- gorequest: A simplified HTTP client (inspired by Node.js's SuperAgent)
- gorilla: A fast, well-tested and widely used WebSocket implementation for Go
- websocket: A minimal and idiomatic WebSocket library for Go
- net: A built-in Go portable interface for network I/O, including TCP/IP, UDP, domain name resolution, and Unix domain sockets
- gots: A Go library for MPEG transport stream handling in Go
- caddy: A fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
- goquery: A package that brings a syntax and a set of features similar to jQuery to the Go language
- encoding/xml: A built-in Go simple XML 1.0 parser that understands XML name spaces
- net/html: A built-in Go Package html implements an HTML5-compliant tokenizer and parser
- xml-stream-parser: An XML stream parser for GO
- pagser: A simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler
- net/url: A built-in Go package that parses URLs and implements query escaping
- urlquery: A URL query string encoder and parser based on Go
- dateparse: A library to parse many date strings without knowing format in advance
- jsonparser: One of the fastest alternative JSON parser for Go that does not require schema
- net/mail: A built-in Go package that implements parsing of mail messages
- markdown: A Markdown parser and HTML renderer for Go
- blackfriday: A Markdown processor for Go
- goldmark: A Markdown parser written in Go. Easy to extend, standard (CommonMark) compliant, well structured
- vitess-sqlparser: A simple SQL parser for Go (Powered by vitess and TiDB)
- sqlparser: An SQL Parser implemented in Go
- grobotstxt: A native Go port of Google's robots.txt parser and matcher library
- gofeed: A library to parse RSS, Atom and JSON feeds in Go
- go-flags: A Go command line option parser
- toml: A TOML parser for Golang with reflection
- colly: An elegant scraper and crawler framework for Golang
- surf: A library for stateful programmatic web browsing in Go
- gospider: A fast web spider written in Go
- ferret: A library for declarative web scraping
- pholcus: A distributed high-concurrency crawler software written in pure golang
- Bright Data's proxy services: A proxy network with over 72 million IPs offering premium residential, datacenter, mobile, and ISP proxies. Supports state, country, ZIP, and ASN level targeting across 195 countries. Works with any HTTP client or scraping library [Bright Data's solution]
- goproxy: An HTTP proxy library for Go
- CAPTCHA Solver: A rapid and automated CAPTCHA solver that can solve challenges from reCAPTCHA, hCaptcha, px_captcha, SimpleCaptcha, GeeTest CAPTCHA, and more [Bright Data's solution]
- captcha: A package that implements generation and verification of image and audio CAPTCHAs
- chromedp: A faster, simpler way to drive browsers supporting the Chrome DevTools Protocol
- rod: A Chrome DevTools Protocol driver for web automation and scraping
- selenium: A Selenium/Webdriver client for Go
- playwright-go: A browser automation library to control Chromium, Firefox and WebKit with a single API. A port of Playwright for Go
- go-rod/stealth: A plugin for anti-bot-detection with rod
- robotgo: A Go native cross-platform RPA and GUI automation library
- mus-go: A set of serialization primitives for Golang
- encoding/json: A built-in Go package that implements encoding and decoding of JSON as defined in RFC 7159.
- encoding/csv: A built-in Go packate that reads and writes comma-separated values (CSV) files
- unioffice: A pure go library for creating and processing Office Word (.docx), Excel (.xlsx) and Powerpoint (.pptx) documents
- yaml: YAML support for the Go language
- x/text: Built-in Go libraries for text processing, many involving Unicode
- strings: A built-in Go package that implements simple functions to manipulate UTF-8 encoded strings
- enconding: A built-in Go libary that defines interfaces shared by other packages that convert data to and from byte-level and textual representations
- utf8: A built-in Go package that implements functions and constants to support text encoded in UTF-8
- time: A built-in Go package that provides functionality for measuring and displaying time
- goment: A Go time library inspired by Moment.js
- now: A time toolkit for golang
- carbon: A simple, semantic and developer-friendly golang package for time
- phonenumbers: The GoLang port of Google's libphonenumber library
- phonenumber: A library that, with a given country and phone number, validates and formats the mobile phone number to E.164 standard
- slug: A URL-friendly slugify with multiple languages support
- async: A safe way to execute functions asynchronously, recovering them in case of panic. It also provides an error stack aiming to facilitate fail causes discovery
- gocron: A Golang job scheduling package
- go-quartz: A minimalist and zero-dependency scheduling library for Go
- cron: A cron library for Go
- HTTP Client: net/http, req, or go-retryablehttp
- HTML Parser: goquery
- colly
- chromedp, rod, or playwright-go