-
Notifications
You must be signed in to change notification settings - Fork 2
/
TEST.txt
75 lines (51 loc) · 4.19 KB
/
TEST.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
POORLY
1. Query: "Cristina Lopes"
Comment: Initial performance was good, although the ranking wasn't great despite including her name.
Increasing the weights for the important tags - namely "h1" allowed her faculty profile to jump higher in addition to more relevant results.
2. Query: "the"
Comment: At first, this common term led to a large number of documents it appeared in, which significantly slowed down the search.
However, our secondary index was indexed by letter instead of the actual term so we ensured our secondary index could be indexed by the actual query token.
3. Query: "master of software engineering"
Comment: With words like "of" which were common or simply longer query entries like this, the time increased a rather significant amount.
A workaround we included was that we loaded our entire secondary index and url id map csv files into memory via dictionary
4. Query: "computer science"
Comment: The results were a little scattered and took longer than expected.
Although the corpus generated only has the token appear 233 times, the results were broad and were usually little pieces of information. Cosine similarity helped.
5. Query: "the of and if"
Comment: This took even longer at first because of multiple common words.
Ultimately we decided instead of doing something where we'd start with documents that included all the terms and then decrement, we just used a simple cosine similarity implementation.
6. Query: "graduate courses"
Comment: This was a pretty standard query but we found that using csv files instead of our initial txt file implementation improved efficiency slightly.
7. Query: "research publications"
Comment: Similar to the previous tokens on example 6, this wasn't necessarily a query we struggled a lot with.
We noticed that using json data formats instead of trying to split by a comma and look through the text made it extremely fast and efficient
8. Query: "machine learning"
Comment: It wasn't too slow, but the query times were slightly higher than optimal.
Top k document returning further increased the speed of the search by reducing the amount of data that needs to be processed and returned to the user.
9. Query: "Informatics"
Comment: This query was also general and broad and didn't really focus in on documents that centered around informatics.
Our best approach to fix this was again increasing the weight of our HTML tags.
10. Query: "zot the"
Comment: Words like "the" would make the query dispatch time much longer than we wanted.
We only wanted to retrieve documents that included "zot" because it's tf-idf score but cosine similarity took care of this.
WELL
11. Query: "Hungry"
Comment: Although I mainly tested this in ANALYST before DEV, it was only in a single document regardless so it was extremely efficient and dropped the url immediately.
12. Query: "coalless"
Comment: Immediate results because it's only in the https://www.ics.uci.edu/~kay/wordlist.txt document and no where else haha.
13. Query: "Python programming"
Comment: Worked pretty well due to enough specificity within the corpus for efficiency while still being able to maintain relevant results
14. Query: "biologist"
Comment: Single terms that centered around a subject or topic generally retrieved relevant and more focused results than terms that might be seen in a variety of documents
15. Query: "covid"
Comment: The search results were quite relevant from the start.
16. Query: "Faculty"
Comment: Initially, the engine just resulted in a lot of different links with different faculty members and I'd argue that's a positive implementation.
17. Query: "graduate"
Comment: Even initially, the model was successful in identifying pages related to graduate studies, including degree programs and graduate courses. This showed a good grasp of the term's relevance in an academic setting.
18. Query: "Quantum"
Comment: Although it didn't really have any links with quantum as its main topic, you didn't have to search through the urls very far.
19. Query: "campus events"
Comment: Returned events.
20. Query: "project"
Comment: Returned projects.