Page Finder

This module detects which links inside a page are pagination links. It works by manually marking inside a web page at least one link as a pagination link. The algorithm then uses label propagation and a gaussian kernel with Levenshtein edit distance as a measure of similarity to determine which other links are pagination links. There is a small demo included to show you how to use and test it.

Install

python setup.py develop

Dependencies: numpy and scrapely

pip install -r requirements.txt

Demo

cd tests
python demo.py https://news.ycombinator.com

Enter link to follow (tab autocompletes): news?<TAB>
Enter link to follow (tab autocompletes): https://news.ycombinator.com/news?p=2 <RET>

0) Quit
1) Enter link directly
2) https://news.ycombinator.com/news?p=3
3) https://news.ycombinator.com/news
4) https://news.ycombinator.com/newest
5) https://news.ycombinator.com/jobs
6) https://news.ycombinator.com/ask
Select link to follow:
2 <RET>

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
page_finder		page_finder
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGES.md		CHANGES.md
LICENSE		LICENSE
Makefile.buildbot		Makefile.buildbot
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Page Finder

Install

Demo

About

Releases

Packages

Languages

License

asadurski/page_finder

Folders and files

Latest commit

History

Repository files navigation

Page Finder

Install

Demo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages