A prototype implementation of a methodology to cluster dynamic URLs of a website. There hereby 2 repositories in this organization for achieving this:
- LinkGraphExtractor: Crawls a given website and stores its URLs on Neo4j.
- URLClusterer: Clusters the URLs it takes as input by running an Apache Spark pipeline over them.
There is also a paper we had written for this study that is published on 2020 IEEE International Conference on Big Data's proceedings.
- Team: Yasin Uygun, Ramazan Faruk Oğuz
- Supervisors: Erdi Ölmezoğulları, Mehmet S. Aktaş