-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike elasticlunr in docs site #575
Spike elasticlunr in docs site #575
Conversation
At least we can prototype a working version of metadata search (i.e., search by title & tags), right? |
That's what I was thinking, yea. |
@srid I've just pushed up some commits with a basic UI (screenshot added to PR description). This may be all I have time for this weekend, but my plan is to hit the plugins/pipeline next. |
Just pushed up a few more commits that add handling of URL params for search. This is a nice-to-have, but also saves me the trouble of adding in a UI for advanced search before I fully understand it. Currently, I'm using this to attempt searching only the I've got the simple general search working, with Field-scoped search is going to take some more research on my part though. |
Cheers, I'll take a peek. We should test the search perf on large Zettelkastens like https://lost-frequencies.eu/impulse.html @zettelzottel Is the source of your public zettelkasten available somewhere? |
@srid no the source is not public. it is hosted on private gitea instance. you can sign up on the instance and i give you access to the repo. https://git.lost-frequencies.eu |
Maybe use https://lost-frequencies.eu/cache.json in |
@flyinggrizzly

Anything in particular I should do when running neuron on ./doc? I access the search HTML using: http://localhost:8080/search.html EDIT: It works (as in it displays the results) when I go to a URL with query like http://localhost:8080/search.html?search=Install (the console warning above still appears), but not when I type a query and press Enter (nothing happens then). EDIT 2: Oh n/m, looks like, in some cases, the search query doesn't do substring match. It requires exact words. |
Yea, that search config warning is still something to look at--I'll add it to the to-do list. Related, I've added in a very basic and probably problematic config for the fieldSearch call that also needs to be actually looked at--so far I've just added enough to get a successful call of the method. Weird about the substring though--I was having decent luck with "instal" (1 "l"). What was your query? It might be useful when working out the search configs. Also, hitting return shouldn't be required for getting the search to fire--if you type in "install" do you get results, or have I used some browser APIs that work in Firefox but not your browser? (I've been trying not to worry about that so much while spiking, and leaving until later making sure that the JS syntax and implementation is general enough to support more browsers... but if it's not working for you I'll move it up the list) |
"instal" works, but "insta" (or "inst") returns nothing. I'd expect search to work even if you type only 3 characters of a word. I suppose this is a matter of configuring elasticlunr? And yea search works without having to hit enter. I think it would be good to see tag-based search; we can then create permalinks to search page that lists all zettels tagged with a tag (and link to it from tag links). Something like Also, what kind of index file can neuron generate that would help the frontend JS maximally (in regards to perf)? Assuming metadata only. I can actually implement this in |
Just checking in--this weekend ended up being overwhelming so I didn't get the time I planned to work on this. But, before it goes too long, here's my thoughts/plans: Search by tagTag search is already configured in the URL params handling, using I went with a Rails-esqu format for the params, but can very easily switch to At the moment, only 1 tag is ever searched at a time, but I thought that building in the option of multiple would help in the future, and doesn't really cost anything now. Do you have a query-string syntax preference? At the moment, cases we need to handle are:
Also, based on the current config, we could also have title search, slug search, or date search (which is a bigger can of worms, but I suspect elasticlunr might have some decent date logic if it's based on Lucene syntax. But not sure). Ideally these would have similar syntax to tags, at least in the naming of the query parameter (so NOTE: while tag search is set up in the URL params handling, it's not working--I still need to get my head around Minimum search string lengthThat does look like an elasticlunr thing--mdBook search fires on any number of keystrokes. I had also added a minimum search string length of 3 which I'll remove too, just in case that's contributing somehow. Server generated supportFor now, I still think it'd be best to keep everything in the browser--the cache is supplying everything we need still. At some point the search index may need to have behavior configured based on plugins that are active, but those are already listed in the cache so that's still OK. |
I see that you have a |
Actually you know what. Let's just do this in plain JavaScript, to begin with. I could modify neuron to use the current |
That makes sense to me. Do you have a minimum browser target in mind? I've been working under the assumption that some compilation might be used so I've been using plenty of ES6--it wouldn't be a problem to manually roll it down to ES5 compatible syntax etc, but I'll just need to check out what works. Also, that query syntax works--I'll update it to use that. For And for the cache stuff, my guess is that it'll be faster/more performant to have a server generated search index as mdBook does, but I don't think the search function is stable enough (or even known to be actually work for Neuron 😅) to warrant putting more of your time into it yet. |
More comprehensive reference for search query syntax: https://help.obsidian.md/Plugins/Search Would be nice if neuron supported the same syntax as that of Obsidian. Assuming the current This new syntax allows for more complex queries - but the frontend JS search (this PR) doesn't have to support all of them of course, even if neuron's new z-queries do. It is just something to keep in mind as a distant possibility. |
This appears to be what is required to get the term "ins" to begin returning results for "install" etc. ("in" is a stop word, so does not trigger the search). Searching for just "ins" returns more than just "install" results, because it could expand to other words. It appears that it also expands to "integration", which makes enough sense to me since that could be a typo. "integration" drops out of the results once the query term is expanded to "inst". This also highlights a potentially better and simpler way to handle tag-search--using the field boost/suppression in the search options to limit tag search to only the tag field.
"ins" as search termThis wasn't working because of a search settings issue--the PipelineOn closer inspection, Tag-searchFixing the term-expansion also made me realize that perhaps const searchOptions = {
bool: "AND",
expand: true,
fields: {
title: { boost: 0 }, // set to 1 on general search
tags: { boost: 1 }, // also set to 1 on general search, but now the only field with a boost > 0
}
} I've still got some time in me today for this so I'll keep going (and editing this comment), but wanted to document progress today so far. performance on large zettelsI've saved the Lost Frequencies cache in static as search on every pageSo I've got the For now, it's just pushing actual page content down (instead of expanding over it), though this could be adjusted.
Right now, in order to avoid eliminating
If we do target the site root, it would probably be best to move the searchbox and results into a modal over the page content--pushing actual page content down will be potentially confusing I think. It would be good to have a better idea of what the final implementation will look like, roughly, before embarking on that though--I've been avoiding external libraries like the plague, but using the DOM APIs for HTML manipulation is definitely less ergonomic than say, JSX. (If you look at the creation of the search bar you'll see what I mean). I think it's worth it to avoid external libraries, but because it takes more care to build up, I'd like to know that I'm heading in the right direction before I put some more work into it. Advanced searchIf Neuron were to generate a page with UI for construction advanced search queries, that could also double as a dedicated search results listing page for linking. Right now, I'm planning on adding date search, and thinking about looking into scoping subqueries to certain fields (so something like |
This PR is a spike at adding client-side search using
ElasticLunr.
This PR message, and the code, are still WIP. I'll keep this up to date as I go.
At the moment it is still an experiment, manually adding in scripts in the
doc/
site.Because
cache.json
does not have the note bodies/content, we can't build infull-text search yet.
I'm not sure if that's a problem at this point, since it may make sense to move
the search index generation out of the browser and into Neuron itself, so that
an index can be requested by the browser instead of being constructed on the
fly.
ElasticLunr appears to have some support for jsonifying its
indices, though I
haven't looked into the process of hydrating these yet. It looks pretty
straightforward though.
To do:
to build the index server-side
determine good/optimum search configs (and suppress default config warning)#fieldSearch
search.html
page, so that might want to be part ofdisplaying results before moving too far ahead
Once those checks are done, I think things'll be in a position to make a better
decision about whether this makes sense for Neuron.
References #568
For #567
Next immediate steps
insta
search issue, keep investigatingseach.md