This is a fork of dsuc
Python crawler for extracting internal and external links from a URL. It can deep-crawl sites too.
git clone https://github.com/giovanni-caiazzo/py-url-crawler.git
cd
into directory and create a virtual env with the requirements.txt
- Normal Crawl
python3 link_crawler.py -d -u http://testsite.com
- Normal Crawl with base path
python3 link_crawler.py -d -u http://testsite.com -b /resources
- Show External Links
python3 link_crawler.py -d -u http://testsite.com -e
- DeepCrawl
python3 link_crawler.py -d -u http://testsite.com