Skip to content

Latest commit

 

History

History
55 lines (36 loc) · 1.65 KB

README.md

File metadata and controls

55 lines (36 loc) · 1.65 KB

x-tweet-url-scraper

A Node.js script to scrape tweets from a list of URLs and save the content to a Markdown file. This script uses Puppeteer for web scraping and displays a progress bar during the scraping process.

Features

  • Reads URLs from a file
  • Validates URLs to ensure they start with https://x.com/
  • Scrapes tweet content and user information
  • Saves the scraped content to a Markdown file
  • Displays a progress bar during the scraping process

Prerequisites

  • Node.js (v12 or higher)
  • npm (Node Package Manager)

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/x-tweet-scraper.git
    cd x-tweet-scraper
  2. Install the dependencies:

    npm install

Usage

  1. Prepare a text file containing the list of tweet URLs (one URL per line). Ensure that the URLs start with https://x.com/.

  2. Run the script with the input file and output file as arguments:

    node scraper.js input.txt output.md
    • input.txt: Path to the file containing tweet URLs.
    • output.md: Path to the output Markdown file where the scraped content will be saved.

Dependencies

  • puppeteer: For web scraping.
  • fs: For file system operations.
  • progress: For displaying a progress bar.

Example

node scraper.js urls.txt tweets.md