Web scraping tool

Web scraping tool that extracts data from a list of URLs. The tool uses the Selenium WebDriver to scrape the data and save it in a JSON file. The tool also takes a screenshot of the URL and saves it as a PNG file.

Installation

git clone https://github.com/ronnuriel/Web-Browser-Module.git
cd Web-Browser-Module

Run Docker container linux or mac

./run.sh

Run Docker container windows

./run.ps1

if you have permission issues, run the following command

chmod +x run.sh

Run the application without Docker

python browser_module.py

Overview

Project Structure

.
├── Dockerfile                 # Dockerfile for containerizing the application.
├── README.md                  # This README file.
├── browser_module.py          # Main Python script for web scraping.
├── input                      # Directory containing input files.
│   └── urls.input             # Text file with URLs to scrape.
├── output                     # Output directory for scraped data.
│   └── url_X                  # Each subdirectory contains results for a URL.
│       ├── browse.json        # JSON file with scraped data.
│       └── screenshot.png     # Screenshot of the URL.
├── requirements.txt           # Python dependencies.
├── run.sh                     # Shell script to run the application.
└── testMain.py                # Unit tests for the application.
'

clean output directory

rm -rf output/*

Run Tests

python -m pytest testMain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web scraping tool

Installation

Overview

Project Structure

Run Tests

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
input		input
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
browser_module.py		browser_module.py
requirements.txt		requirements.txt
run.ps1		run.ps1
run.sh		run.sh
testMain.py		testMain.py

ronnuriel/Web-Browser-Module

Folders and files

Latest commit

History

Repository files navigation

Web scraping tool

Installation

Overview

Project Structure

Run Tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages