URL Clusterer - White Paper

Description

A prototype implementation of a methodology to cluster dynamic URLs of a website. There hereby 2 repositories in this organization for achieving this:

LinkGraphExtractor: Crawls a given website and stores its URLs on Neo4j.
URLClusterer: Clusters the URLs it takes as input by running an Apache Spark pipeline over them.

There is also a paper we had written for this study that is published on 2020 IEEE International Conference on Big Data's proceedings.

Credits

Team: Yasin Uygun, Ramazan Faruk Oğuz
Supervisors: Erdi Ölmezoğulları, Mehmet S. Aktaş

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

URL Clusterer - White Paper

Description

Credits

About

Releases

Packages

License

url-clusterer/white-paper

Folders and files

Latest commit

History

Repository files navigation

URL Clusterer - White Paper

Description

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages