Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search across all Asymmetric Text Content Types in Parallel #232

Merged

Conversation

debanjum
Copy link
Member

@debanjum debanjum commented Jun 21, 2023

  • Allow searching across asymmetric text content types using threads
    • Query time on my Mac averages 95ms latency (140ms at 90 percentile) across (Org, Markdown, Github, PDF and Music content types)
    • This is not too much more than search for a single content type (maybe max ~50% latency increase?). Encoding query is what takes most of the time anyway and that's just done once like before, threading adds some overhead
    • An average of 95 ms latency or 140ms at 90th percentile is inline with keeping an incremental search (search-as-you-type) experience
  • Put logic to remove filter terms from query in a defilter method for each filter
  • Encode query once during search to encode query once across all (asymmetric) content types
  • Search across all content types via the web and emacs interfaces in d5fb419 and 5c4eb95 respectively
  • Allow Khoj Chat to pull relevant data from across content types (without the perf hit). Khoj chat is only pulling data from a single content type currently

- Update API to return content from all enabled content types when type
  is not set to specific type in HTTP request param
- To do this efficiently run the search queries in parallel threads
- Add new filter abstract method to remove filter terms from query
- Use the filter method to remove filter terms, encode this defiltered
  query and pass it to the query methods of each search types

TODO: Encoding query is still taking 100-200 ms unlike before. Need to
investigate why
Use timer to measure time to encode queries and total search time
@debanjum debanjum added the upgrade New feature or request label Jun 21, 2023
@debanjum debanjum requested a review from sabaimran June 21, 2023 06:20
If no content-type selected in transient menu option, khoj.el queries
khoj server without content-type parameter (t) set.

This results in search across all enabled asymmetric search text
content types
@debanjum debanjum force-pushed the parallelize-search-across-all-asymmetric-text-content-types branch from 9827221 to 5c4eb95 Compare June 21, 2023 07:10
@debanjum debanjum force-pushed the master branch 3 times, most recently from 75ab8f2 to 6d4aad5 Compare June 21, 2023 08:57
@debanjum debanjum linked an issue Jun 21, 2023 that may be closed by this pull request
Copy link
Member

@sabaimran sabaimran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! This is a game changer 🎊 .

src/interface/emacs/khoj.el Outdated Show resolved Hide resolved
src/khoj/configure.py Outdated Show resolved Hide resolved
src/khoj/configure.py Show resolved Hide resolved
src/khoj/interface/web/index.html Outdated Show resolved Hide resolved
src/khoj/search_filter/date_filter.py Show resolved Hide resolved
@debanjum debanjum force-pushed the parallelize-search-across-all-asymmetric-text-content-types branch from f62e4c2 to 5fc92e5 Compare June 29, 2023 03:28
- So when searching across content types (with content-type = "all")
  org-mode results get rendered differently than markdown, PDF etc. results

- Set div class for each result separately instead of a single uber div
  for styling. This allows styling div of each result based on the
  content-type of that result

- No need to create placeholder "all" content type on web interface as
  server is passing an all content type by itself
…n khoj.el

- Add "all" as default content type when no content type retrieved
  from server
- Set image_search.query to async to use it with multi-threading
  This is same as text_search.query being set to an async method
- Exit search early if no search_model is defined in state.model
The text, image search query method has become async. So async/await
is required to get results correctly in tests etc
@debanjum debanjum force-pushed the parallelize-search-across-all-asymmetric-text-content-types branch from 5fc92e5 to f469d0a Compare June 29, 2023 05:08
@debanjum debanjum force-pushed the parallelize-search-across-all-asymmetric-text-content-types branch from f469d0a to 5f2717c Compare June 29, 2023 05:16
@debanjum debanjum merged commit f272d45 into master Jun 29, 2023
10 checks passed
@debanjum debanjum deleted the parallelize-search-across-all-asymmetric-text-content-types branch June 29, 2023 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upgrade New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setup Unified Search to search across all content types
2 participants