The content of the repository is:
README.md: a Markdown file that explains the content of your repository.
collector.py: a python file that contains the line of code needed to collect your data our the html page (from which we get the urls) and Wikipedia.
collector_utils.py: a python file that stores the function we used in collector.py.
parser.py: a python file that contains the line of code needed to parse the entire collection of html pages and save those in tsv files.
parser_utils.py: a python file that gathers the function we used in parser.py.
index.py: a python file that once executed generate the indexes of the Search engines.
index_utils.py: a python file that contains the functions we used for creating indexes.
main.py: a python file that once executed build up the search engine. In the file the user will be able to choose: the engine(1, 2 or 3) and extra features if on engine3.
exercise_4.py: python file that contains the implementation of the algorithm that solves problem 4.
main.ipynb: a Jupyter notebook explaines the strategies you adopted solving the homework