Skip to content

Latest commit

 

History

History
45 lines (34 loc) · 1.9 KB

README.md

File metadata and controls

45 lines (34 loc) · 1.9 KB

Search Engine

Screenshot 2023-06-02 at 10 10 29 AM Screenshot 2023-06-02 at 10 10 36 AM Screenshot 2023-06-02 at 10 10 42 AM

How to Run:

  1. Prerequisites:

    • macOS 10.15+
    • Python 3.9+
  2. Setup:

    • Ensure necessary Python packages are installed
      • pip install flask
      • pip install nltk
      • pip install beautifulsoup4
        • Depending on system, you may need to use "pip3" instead of "pip"
  3. Indexing:

    • Replace root files in "retrieval.py" and "main.py" to match your system
    • First run main.py without generating secondary index
      • This is because there is a byte offset error when we try doing it immediately after
      • python3 main.py
    • The merged index should be finished between 1-2 hours if using DEV directory
    • After this is complete comment out other code in main function and generate secondary index
      • python3 main.py
    • You should now have a "merged_index.csv", "secondary_index.csv", and "url_id_map.csv" file
  4. Web GUI/Search Interface:

    • Enter web gui directory
      • cd web_gui/
    • python3 app.py
    • You can view development server at "http://127.0.0.1:5001"
  5. Simple Query:

    • The interface will prompt the user for a query
    • After entering the requested/desired query, our program will use calculations such as td-idf, similarity, and indexing to create the list
    • This list will be the result of ranked pages with the most relevant at the start/beginning
  6. Terminal GUI:

    • python3 retrieval.py