Skip to content

scrapes the text content from X (prev. twitter) url input into markdown format

License

Notifications You must be signed in to change notification settings

noelje/x-tweet-url-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

x-tweet-url-scraper

A Node.js script to scrape tweets from a list of URLs and save the content to a Markdown file. This script uses Puppeteer for web scraping and displays a progress bar during the scraping process.

Features

  • Reads URLs from a file
  • Validates URLs to ensure they start with https://x.com/
  • Scrapes tweet content and user information
  • Saves the scraped content to a Markdown file
  • Displays a progress bar during the scraping process

Prerequisites

  • Node.js (v12 or higher)
  • npm (Node Package Manager)

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/x-tweet-scraper.git
    cd x-tweet-scraper
  2. Install the dependencies:

    npm install

Usage

  1. Prepare a text file containing the list of tweet URLs (one URL per line). Ensure that the URLs start with https://x.com/.

  2. Run the script with the input file and output file as arguments:

    node scraper.js input.txt output.md
    • input.txt: Path to the file containing tweet URLs.
    • output.md: Path to the output Markdown file where the scraped content will be saved.

Dependencies

  • puppeteer: For web scraping.
  • fs: For file system operations.
  • progress: For displaying a progress bar.

Example

node scraper.js urls.txt tweets.md

About

scrapes the text content from X (prev. twitter) url input into markdown format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published