A search engine developed as a part of the university course "Advanced Programming Techniques" applying the basic search engines structure and hierarchy.
- Voice Search: Excess supports voice search in English language through using a text to speech api.
-
Phrase Searching:
Instead of some keywords, excess supports phrase searching to search for an exact match or a semi-exact match of a phrase.
You can also concatenate phrases with operators such as (AND, OR, NOT) and retrieve the results for complex search queries. - Keywords Suggestion: Based on your search history, excess can anticipate your keyword, thus facilitating searching process.
- Results Paging and User-Friendly UI: results are paged 10 results per single page.
- Crawler can save its state if interrupted
There are different packages each one resembles a part of the search-engine structure
-
Crawler: A thread-safe multithreaded crawler responsible for crawling web pages starting from the seed of links provided in the fileseed.txt(maximum number of pages tested was 10000), the output of the crawler is a serializable file with crawled html documents together with URLs. Indexer: A thread-safe multithreaded indexer for indexing crawled web pages and uploading the inverted file to a Cloud MongoDB database.QueryProcessor: For processing the search query through removing stop words and stemming.PageRanker: For applying page ranking algorithm on collected webpagesRanker: For Ranking search query results based on term frequency and document frequency.MongoDB: an interface for handling MongoDB connections.ComplexPhraseSearching: for handling both normal and operator-separated search queries.SpringBoot: To interface frontend with backend
As simple as any search engine just enter the search query and enjoy the results.
🔵 To run the React App on your localhost
- Ensure you have nodejs on your pc.
- Clone this repo to your pc and navigate to
clientdirectory. - Open
cmdin your current directory and run the commandnpm install. - Wait a while then run the command
npm start
Now you should find the search engine running on your default browser (preferred to be chromium based) on localhost:3000
🔵 To run the Backend
- Open the Java Project in your preferred IDE.
- Navigate to
PageRankerpackage. - Navigate to
RankerMainand run it.
And voilà the search engine is now ready to use
To re- crawl navigate to Main.java and run it
To re-index navigate to Indexer package then run IndexerMain