This document contains a list of libraries and resources for web scraping in JavaScript.
- Libraries
- Popular Web Scraping Stacks
- Guides and Tutorials
Note: All selected libraries are actively maintained.
- fetch: A built-in Node.js browser-compatible implementation of the
fetch()
function - axios: A
Promise
-based HTTP client for the browser and Node.js [Axios Proxy Integration] - node-fetch: A light-weight module that brings the Fetch API to Node.js [node-fetch Proxy Integration]
- undici: An HTTP/1.1 client, written from scratch for Node.js
- superagent: A small progressive client-side HTTP request library, and Node.js module with the same API, supporting many high-level HTTP client features [SuperAgent Proxy Integration]
- urllib: A library to request HTTP(s) URLs in a complex world
- node-libcurl: libcurl bindings for Node.js
- got: A human-friendly and powerful HTTP request library for Node.js
- bent: A functional HTTP client for Node.js w/ async/await
- needle: A nimble, streamable HTTP client for Node.js. With proxy, iconv, cookie, deflate & multipart support
- ws: A simple to use, blazing fast and thoroughly tested WebSocket client and server for Node.js
- WebScoket-Node: A WebSocket implementation for Node.JS (Draft -08 through the final RFC 6455)
- node:net: A built-in Node.js module that provides an asynchronous network API for creating stream-based TCP or IPC servers and clients
- multicast-dns: A low-level multicast DNS implementation in pure JavaScript
- node-ip: IP address tools for node.js
- wreck: HTTP client utilities for the hapi web framework
- proxy-chain: A Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining
- cheerio: A fast, flexible, and elegant library for parsing and manipulating HTML and XML
- fast-xml-parser: A library to validate XML, parse XML, and build XML rapidly without C/C++ based libraries and no callback
- node-html-parser: A very fast HTML parser, generating a simplified DOM, with basic element query support
- html-dom-parser: An HTML to DOM parser
- parse5: An HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant
- htmlparser2: A fast and forgiving HTML and XML parser
- sax-js: A sax style parser for JS
- node:url: A built-in Node.js module that provides utilities for URL resolution and parsing
- query-string: A library to parse and stringify URL query strings.
- URI.js: A Javascript URL mutation library
- csv-parse: A parser converting CSV text input into arrays or objects
- fast-csv: A CSV parser and formatter for Node.js
- pdf-parse: A pure JavaScript cross-platform module to extract texts from PDFs
- pdf2json: A library that converts binary PDF to JSON and text, for server-side PDF processing and command-line use
- http-parser-js: A pure JS HTTP parser for Node.js
- email-reply-parser: A Node.js library for parsing plain text email content
- email-forward-parser: A library that parses forwarded emails and extracts original content
- node-address-rfc2822: A parser for RFC2822 (Header) format email addresses
- smtp-address-parser: A library to parse an SMTP (RFC-5321) address
- markdown-it: A Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed
- marked: A Markdown parser and compiler. Built for speed
- micromark: A small, safe, and great commonmark (optionally gfm) compliant Markdown parser
- yaml: A YAML parser and stringifier for JavaScript
- js-yaml: A JavaScript YAML parser and dumper. Very fast
- yaml-eslint-parser: A YAML parser that produces output compatible with ESLint
- node-sql-parser: Parse simple SQL statements into an abstract syntax tree (AST) with the visited tableList and convert it back to SQL
- js-sql-parser: An SQL parser written with jison. Parses SQL into abstract syntax tree (AST) and stringifies back to SQL
- xlsx: A spreadsheet data parser and writer
- exceljs: A library to read, manipulate, and write spreadsheet data and styles to XLSX and JSON
- docx: A library to easily generate and modify .docx files with JS/TS with a nice declarative API. Works for Node.js and on the browser
- robots-parser: A Node.js robots.txt parser with support for wildcard (*) matching
- sitemapper: A parser for XML Sitemaps to be used with robots.txt and web crawlers
- ip-address: A library for parsing and manipulating IPv4 and IPv6 addresses in JavaScript
- feed-extractor: Simplest way to read & normalize RSS/ATOM/JSON feed data
- sanitize-html: A library to Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
- parse-css: A standards-based CSS parser
- js-xss: A library to sanitize untrusted HTML (to prevent XSS) with a configuration specified by a Whitelist
- crawlee: A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation
- node-crawler: A web crawler/spider for Node.JS + server-side jQuery
- ayakashi: A next generation web scraping framework
- webster: A reliable high-level web crawling & scraping framework for Node.js
- node-curl-impersonate: A library that allows you to use curl-impersonate natively
- cloudflare-scraper: A package to bypass Cloudflare's protection
- unblocker: A web proxy for evading internet censorship, and general-purpose Node.js library for proxying and rewriting remote webpages
- Bright Data's proxy services: A proxy network with over 72 million IPs offering premium residential, datacenter, mobile, and ISP proxies. Supports state, country, ZIP, and ASN level targeting across 195 countries. Works with any HTTP client or scraping library [Bright Data's solution]
- CAPTCHA Solver: A rapid and automated CAPTCHA solver that can solve challenges from reCAPTCHA, hCaptcha, px_captcha, SimpleCaptcha, GeeTest CAPTCHA, and more [Bright Data's solution]
- nopecha-nodejs: An automated CAPTCHA solver for Node.js
- 2captcha: A wrapper around the 2Captcha API
- 2captcha-javascript: A JavaScript library for easy integration with the API of 2captcha captcha solving service to bypass reCAPTCHA, hCaptcha, funcaptcha, geetest and solve any other CAPTCHAs
- user-agents: A JavaScript library for generating random user agents with data that's updated daily
- puppeteer: A JavaScript library which provides a high-level API to control Chrome or Firefox over the DevTools Protocol or WebDriver BiDi [Puppeteer Proxy Integration]
- playwright: A framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API [Playwright Proxy Integration]
- selenium: A browser automation framework and ecosystem [Selenium Proxy Integration]
- cypress: A fast, easy and reliable testing for anything that runs in a browser
- webdriverio: A next-gen browser and mobile automation test framework for Node.js
- wendigo: A proper monster for front-end automated testing
- puppeteer-extra-plugin-stealth: A plugin for puppeteer-extra and playwright-extra to prevent detection
- puppeteer-extra-plugin-anonymize-ua: A plugin for puppeteer-extra and playwright-extra to anonymize the User-Agent on all pages
- @extra/proxy-router: A plugin for playwright-extra and puppeteer-extra to route proxies dynamically
- puppeteer-extra-plugin-block-resources: A plugin for playwright-extra and puppeteer-extra to block resources (images, media, css, etc.)
- playwright-extra: A modular plugin framework for playwright to enable cool plugins through a clean interface
- puppeteer-extra: A library to teach puppeteer new tricks through plugins
- JSON: A built-in JavaScript namespace that contains static methods for parsing values from and converting values to JavaScript Object Notation
- csv-generate: A flexible generator of random CSV strings and JavaScript objects implementing the Node.js
stream.Readable
API
- protobuf.js: Protocol buffers for JavaScript & TypeScript
- jBinary: A high-level API for working with binary data
- encoding.js: A librarry to convert and detect character encoding in JavaScript
- chardet: A character encoding detection tool for Node.js
- iconv-lite: A library to convert character encodings in pure JavaScript
- dayjs: A 2kB immutable date-time library alternative to Moment.js with the same modern API
- date-fns: A modern JavaScript date utility library
- luxon: A library for working with dates and times in JS
- money.js: A tiny (1kb) javascript currency conversion library, for web & Node.js
- currency.js: A JavaScript library for handling currencies
- libphonenumber-js: A simpler (and smaller) rewrite of Google Android's libphonenumber library in JavaScript
- phone: A library to validate and reformat the mobile phone number to the E.164 standard
- unique-slug: A library that slugifies even UTF-8 characters
- unique-slug: A library to generate a unique character string suitible for use in files and URLs
- remove-accents: A library that removes the accents from a string, converting them to their non-accented corresponding characters
- nodejieba: A library that provides Chinese word segmentation for Node.js
- node-schedule: A cron-like and not-cron-like job scheduler for Node
- node-cron: A simple cron-like job scheduler for Node.js
- bree: A Node.js and JavaScript job task scheduler with worker threads, cron, Date, and human syntax
- cron: A robust tool for running jobs (functions or commands) on schedules defined using the cron syntax
- HTTP Client: Axios, node-fetch, fetch, or SuperAgent
- HTML Parser: Cheerio
- Crawlee
- Playwright, Puppeteer, Selenium, or Cypress
- Web Scraping With JavaScript and Node.js Guide
- Web Scraping With Next.JS in 2024
- Web Scraping with Crawlee: Step-By-Step Tutorial
- Using Cheerio NPM for Web Scraping
- Playwright Web Scraping - 2024 Guide
- Web Scraping with Puppeteer - 2024 Guide
- HTTP Requests in Node.js with Fetch API
- How To Set a Proxy in Axios: Definitive Guide
- How To Set a Proxy in SuperAgent
- How to Use Proxy in Node-Fetch Guide
- How to Use Proxy Servers in Node.js
- Avoiding Bot Detection with Playwright Stealth
- Avoid Getting Blocked With Puppeteer Stealth
- How to Bypass CAPTCHAs with Playwright
- Using Node Unblocker for Web Scraping
- JavaScript vs Python for Web Scraping Comparison
- C# vs JavaScript for Web Scraping
- JavaScript vs Rust for Web Scraping
- Cheerio vs. Puppeteer for Web Scraping
- Puppeteer vs. Selenium - Which One to Choose?
- Scrapy vs Puppeteer for Web Scraping
- Playwright vs. Selenium Comparison 2024
- Puppeteer vs Playwright for Web Scraping