A collection of scrapers, spiders, crawlers, and related tools.
A curated list of anything open-source in the PHP crawler and scraping space: Scrapers, Crawlers, Spiders, Tools and along with how to guides, articles, etc.
- Spatie/Crawler - An easy to use, powerful crawler implemented in PHP. Can execute JavaScript. Toolkit available for those keen to use the full power of the Spatie crawler.
- crawlzone/crawlzone - Crawlzone is a fast asynchronous crawling framework.
- zrashwani/arachnid - SEO-focused crawler to collect link information, etc.
- nadar/crawler - A Website Crawler Implementation written in PHP. High extendible, Indexes PDFs and is very memory efficient.
- mvdbos/PHP-Spider - A configurable and extensible PHP web spider. Various Examples available.
- spekulatius/PHPScraper - A simple way to scraper and crawl the web from PHP.
- roach-php/core - A complete PHP web-scraping toolkit inspired by Scrapy. Laravel adapter available.
- spatie/robots-txt - Determine if a page may be crawled from robots.txt, robots meta tags and robot headers.
- symfony/dom-crawler - The DomCrawler component eases DOM navigation for HTML and XML documents.
- symfony/panther - A browser testing and web crawling library for PHP and Symfony.
- JayBizzle/Crawler-Detect - CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent.
- donatj/PhpUserAgent - Lightning Fast, Minimalist PHP User Agent String Parser.
- niespodd/browser-fingerprinting - Analysis of Bot Protection systems with available countermeasures.
- Masterminds/html5-php - An HTML5 parser and serializer for PHP.
- symfony/html-sanitizer - Provides an object-oriented API to sanitize untrusted HTML input for safe insertion into a document's DOM.
Contributions of any kind welcome, just follow the guidelines!