Access on: https://fycrawler.herokuapp.com/
( Backend Side is private Repository 😉 ) , If you want to use backend as a docker image, you will be able to run as a container with this image.
Backend Full API Documentation Swagger Access on: https://spiderfy.herokuapp.com/swagger-ui.html
All frameworks and libraries are %100 open source ❤️.
Feature Count | Methodology | Description |
---|---|---|
#1 | 🕸️Crawl | To retrieve all links from given website. |
#2 | 🕸️Crawl | To retrieve all images from given website. |
#3 | 💬NLP | To analyze all words frequencies (NLP) from given website url. |
#4 | 💬NLP | To analyze specific word frequencies (NLP) from given website url. |
#5 | 🕸️Crawl | To retrieve all metatags from given website. |
#6 | 🔖Sitemap - 🕸️Crawl | To retrieve all sitemap nodes from given website. |
#7 | 🔖Sitemap - 🕸️Crawl | To retrieve all links from given sitemap url. |
#8 | 🖼OCR | To convert base64 to text (OCR function) from given snapshoot. |
#9 | 🗳RSS - 🕸️Crawl | To retrieve all RSS Feeds from given RSS url. |
#10 | 📊Analysis - 📄PDF | To summary all javascript files usage (Content length, file size) from given website. |
#11 | 🗳RSS - 🕸️Crawl | To retrieve Turkey RSS News Feeds from RSS list. |
#12 | 📑Static File | To show 3500 user agent from static list. |
#13 | 🖱 HTTP Request Manipulation | To insert randomly selected user agent on http request according to user agents static list |
#14 | 🕸️Crawl | To obtain only text without any html tags from given website. |
#15 | 🕸️Crawl | To obtain only all html source code from given website. |
#16 | 🌐Selenium | To retrieve all links/link snapshoot on base64 format. |
#17 | 📊Analysis - 🕸️Crawl | To obtain top 50 sites in Turkey from Alexa. |
#18 | 📊Analysis - 🕸️Crawl | To obtain top 50 sites in Turkey from SimilarWeb. |
#19 | 📊Analysis - 🕸️Crawl | To show rank site from Alexa. |
#20 | 📊Analysis - 📄PDF | To generate pdf output from given website url. |
#21 | 📊Analysis - 📄PDF | To generate pdf output from given any html. |
#22 | 📊Analysis | To calculate page load time from given any url. |
#23 | 🕸️Crawl | To retrieve internal backlinks from given any url. |
#24 | 🕸️Crawl | To retrieve outgoing backlinks from given any url. |
#25 | 🕸️Crawl | To generate summary data [Consist of: #1 ,#2, #5, #14, #15, #23, #24] from given website. |
#26 | 📊Analysis -🕸️Crawl | To generate summary usage of html tags given in website. |