Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve search functionality (goal is to improve how it returns results for lv-dof) #20

Open
louh opened this issue Sep 20, 2013 · 9 comments

Comments

@louh
Copy link
Member

louh commented Sep 20, 2013

Some things @rclosner and I discussed:

  • fuzzy/partial matches
  • search terms with other words between them
  • ranked relevance (terms matching title or illustrative examples should rank higher than in description)

cc @migurski

@lovehandle
Copy link
Member

@louh @migurski

I think for our application it makes the most sense to insert NAICS API data into a DB. In the same vein, I think the API would probably draw some benefit from moving to a DB as well (as opposed to a flat JSON file). We could utilize an existing FTS program, and we'd probably gain some response time speed improvements while we're at it.

Maybe this deserves a new thread, but what are the thoughts on adding tags to classification codes? I think the additional metadata would help in retrieving relevant codes for FTS's.

@louh
Copy link
Member Author

louh commented Sep 21, 2013

What's an FTS?

How are tags generated? I think the "index entries" and "illustrative examples" were an attempt by the NAICS writers to provide text hooks for finding codes by similar titles.

@migurski
Copy link
Contributor

FTS is full-text search.

I’d hold off on a DB for now; the dataset IIRC is very small, and if loaded at launch time could happily hang out in memory without the overhead of an external database service. I’m not even sure this is worth testing for now; the simplicity benefit of running from flat local files is tremendous. Building a simple full text index is very easy, if even necessary here.

@lovehandle
Copy link
Member

That's a good point, actually. Using a DB for the API would probably be a little overkill. Mostly what I'd like to get is something a little fuzzier than exact match. I've been playing around with some of the FTS JS libraries that have this out of the box (e.g. lunr, fullproof, etc), but they've all been a little wonky.

My initial thought would be that the addition of tags (maybe generated by Mechanical Turk?) would weight the searches in the right direction. Not sure how feasible that is, though. Open to ideas.

@louh
Copy link
Member Author

louh commented Sep 21, 2013

I'd suggest weighting searches in this order: title, index entries, illustrative examples, description

As an aside, the DOF front end is now storing all search inputs in the background for later analysis. Maybe something useful could come of that.

@lovehandle
Copy link
Member

@louh interesting. Should we persist search terms in the session? We could save them into the DB for later usage.

@louh
Copy link
Member Author

louh commented Sep 21, 2013

@rclosner Yep, I'm trying to figure out how LocalStorage works now... if we want to save it, it'll be there for now on the user side.

@lovehandle
Copy link
Member

@louh 👍

@migurski
Copy link
Contributor

LocalStorage or cookies are probably a better bet, since they will leave us with a read-only application for lower maintenance burden.

Fuzzy matches are an interesting opportunity for Git/JSON-driven search terms. They can be initially generated automatically, committed to the project, and then further edited via manual updates and such. Things like synonyms and related terms are so human.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants