Skip to content

A Python web scraper built with BeautifulSoup for extracting data from websites, handling links and paginations, and saving results to CSV.

License

Notifications You must be signed in to change notification settings

pi22by7/scrapegoat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraper

This is a Python-based web scraping application that allows you to extract data from websites in a simple and efficient way. It provides functionality to handle links and paginations, making it easier to scrape multiple pages or follow links within a website.

Features

  • Extract data from websites using specified element names, class names, and ID names.
  • Handle links and paginations to scrape multiple pages or follow links within a website.
  • Save the scraped data to a CSV file for further analysis or processing.
  • Customizable user agent selection to mimic different web browsers or devices.
  • Modern GUI interface for easy input and interaction.

Installation

  1. Clone the repository: git clone https://github.com/pi22by7/scraper.git
  2. Navigate to the project directory: cd web-scraper
  3. Install the required dependencies: pip install -r requirements.txt

Usage

  1. Run the application: python gui.py
  2. Enter the URL to scrape, element name, class name (optional), and ID name (optional) in the GUI.
  3. [WIP] Optionally, select a user agent from the dropdown menu to mimic different web browsers or devices.
  4. Click the "Scrape" button to start the scraping process.
  5. The scraped data will be saved to a CSV file in your chosen directory.

Screenshots

Screenshot

Contributing

Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request.

License

GPLv3

About

A Python web scraper built with BeautifulSoup for extracting data from websites, handling links and paginations, and saving results to CSV.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages