This document contains a list of crates and resources for web scraping in Rust.
- Libraries
- Popular Web Scraping Stacks
- Guides and Tutorials
Note: All selected crates are either widely used or actively maintained.
- reqwest: An ergonomic, batteries-included HTTP Client for Rust
- ureq: A simple, safe HTTP client
- curl-rust: Rust bindings to libcurl
- attohttpc: A Rust lightweight HTTP 1.1 client
- actix-web: A powerful, pragmatic, and extremely fast web framework for Rust
- isahc: A practical HTTP client that is fun to use
- rust-websocket: A WebSocket (RFC6455) library written in Rust
- tungstenite-rs: A lightweight stream-based WebSocket implementation for Rust
- websocket.rs: A WebSocket implementation for both client and server
- hyper: A low level protective and efficient HTTP library for all, meant to be a building block for libraries and applications
- tiny-http: A low-level HTTP server library in Rust
- libpnet: A crate for cross-platform, low level networking using the Rust programming language
- pcap: A Rust language crate for accessing the packet sniffing capabilities of libpcap
- rustls: A modern TLS library in Rust
- hyper-util: A collection of utilities to do common things with hyper
- reqwest-middleware: A Wrapper around reqwest to allow for client middleware chains
- scraper: HTML parsing and querying with CSS selectors
- html5ever: A high-performance browser-grade HTML5 parser
- select.rs: A Rust library to extract useful data from HTML documents, suitable for web scraping
- quick-xml: A Rust high performance XML reader and writer
- roxmltree: A crate to represent an XML document as a read-only tree
- rust-url: A URL parser for Rust
- https://github.com/seanmonstar/httparse: A push parser for the HTTP 1.x protocol in Rust
- pdf-extract: A Rust library for extracting content from PDFs
- mailparse: A Rust library to parse mail files
- pulldown-cmark: An efficient, reliable parser for CommonMark, a standard dialect of Markdown
- markdown-rs: A CommonMark compliant markdown parser in Rust with ASTs and extensions
- yaml-rust: A pure rust YAML implementation
- datafusion-sqlparser-rs: An extensible SQL lexer and parser for Rust
- calamine: A pure Rust Excel/OpenDocument SpreadSheets file reader
- pest: A general purpose parser written in Rust with a focus on accessibility, correctness, and performance
- rust-cssparser: A Rust implementation of CSS Syntax Level 3
- ammonia: A crate to repair and secure untrusted HTML
- ttf-parser: A high-level, safe, zero-allocation TrueType font parser
- robotstxt: A native Rust port of Google's robots.txt parser and matcher C++ library
- rss: A library for serializing the RSS web content syndication format
- collie: A minimal feed reader just for you
- spider: A web crawler and scraper for Rust
- dyer: A Rust crate designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed
- Bright Data's proxy services: A proxy network with over 72 million IPs offering premium residential, datacenter, mobile, and ISP proxies. Supports state, country, ZIP, and ASN level targeting across 195 countries. Works with any HTTP client or scraping library [Bright Data's solution]
- CAPTCHA Solver: A rapid and automated CAPTCHA solver that can solve challenges from reCAPTCHA, hCaptcha, px_captcha, SimpleCaptcha, GeeTest CAPTCHA, and more [Bright Data's solution]
- challenge-bypass-ristretto: A Rust implemention of the privacy pass cryptographic protocol using the Ristretto group
- rust-headless-chrome: A high-level API to control headless Chrome or Chromium over the DevTools Protocol. It is the Rust equivalent of Puppeteer, a Node library maintained by the Chrome DevTools team
- thirtyfour: A Selenium WebDriver client for Rust, for automated testing of websites
- chromiumoxide: A Rust crate that provides a high-level and async API to control Chrome or Chromium over the DevTools Protocol
- playwright-rust: A Playwright port to Rust
- webdriver-downloader: A CLI interface & library for WebDriver download
- serde: A serialization framework for Rust
- rust-base64: A Rust crate that encodes and decodes base64 as bytes or utf8
- encoding_rs: A Gecko-oriented implementation of the Encoding Standard in Rust
- rust-lexical: A Rust crate that provides fast numeric to- and from-string conversion routines
- chrono: A date and time library for Rust
- time: The most used Rust library for date and time handling
- httpdate: HTTP date parsing and formatting
- rust-phonenumber: A library for parsing, formatting and validating international phone numbers
- human-name: A Rust library for parsing and comparing human names
- slug-rs: A small library for generating ASCII slugs from unicode strings
- tokio: A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, etc.
- rayon: A data parallelism library for Rust
- async-task: A task abstraction for building executors
- clokwerk: A simple scheduler for Rust
- HTTP Client: reqwest, ureq, curl-rust
- HTML Parser: scraper, html5ever, select.rs, or quick-xml
- spider
- rust-headless-chrome, thirtyfour, or chromiumoxide