A Node.js script to scrape tweets from a list of URLs and save the content to a Markdown file. This script uses Puppeteer for web scraping and displays a progress bar during the scraping process.
- Reads URLs from a file
- Validates URLs to ensure they start with
https://x.com/
- Scrapes tweet content and user information
- Saves the scraped content to a Markdown file
- Displays a progress bar during the scraping process
- Node.js (v12 or higher)
- npm (Node Package Manager)
-
Clone the repository:
git clone https://github.com/your-username/x-tweet-scraper.git cd x-tweet-scraper
-
Install the dependencies:
npm install
-
Prepare a text file containing the list of tweet URLs (one URL per line). Ensure that the URLs start with
https://x.com/
. -
Run the script with the input file and output file as arguments:
node scraper.js input.txt output.md
input.txt
: Path to the file containing tweet URLs.output.md
: Path to the output Markdown file where the scraped content will be saved.
- puppeteer: For web scraping.
- fs: For file system operations.
- progress: For displaying a progress bar.
node scraper.js urls.txt tweets.md