Improve search functionality (goal is to improve how it returns results for lv-dof) #20

louh · 2013-09-20T18:20:31Z

Some things @rclosner and I discussed:

fuzzy/partial matches
search terms with other words between them
ranked relevance (terms matching title or illustrative examples should rank higher than in description)

lovehandle · 2013-09-21T00:11:57Z

I think for our application it makes the most sense to insert NAICS API data into a DB. In the same vein, I think the API would probably draw some benefit from moving to a DB as well (as opposed to a flat JSON file). We could utilize an existing FTS program, and we'd probably gain some response time speed improvements while we're at it.

Maybe this deserves a new thread, but what are the thoughts on adding tags to classification codes? I think the additional metadata would help in retrieving relevant codes for FTS's.

louh · 2013-09-21T01:51:53Z

What's an FTS?

How are tags generated? I think the "index entries" and "illustrative examples" were an attempt by the NAICS writers to provide text hooks for finding codes by similar titles.

migurski · 2013-09-21T08:14:32Z

FTS is full-text search.

I’d hold off on a DB for now; the dataset IIRC is very small, and if loaded at launch time could happily hang out in memory without the overhead of an external database service. I’m not even sure this is worth testing for now; the simplicity benefit of running from flat local files is tremendous. Building a simple full text index is very easy, if even necessary here.

lovehandle · 2013-09-21T18:27:07Z

That's a good point, actually. Using a DB for the API would probably be a little overkill. Mostly what I'd like to get is something a little fuzzier than exact match. I've been playing around with some of the FTS JS libraries that have this out of the box (e.g. lunr, fullproof, etc), but they've all been a little wonky.

My initial thought would be that the addition of tags (maybe generated by Mechanical Turk?) would weight the searches in the right direction. Not sure how feasible that is, though. Open to ideas.

louh · 2013-09-21T19:15:24Z

I'd suggest weighting searches in this order: title, index entries, illustrative examples, description

As an aside, the DOF front end is now storing all search inputs in the background for later analysis. Maybe something useful could come of that.

lovehandle · 2013-09-21T19:20:46Z

@louh interesting. Should we persist search terms in the session? We could save them into the DB for later usage.

louh · 2013-09-21T19:27:36Z

@rclosner Yep, I'm trying to figure out how LocalStorage works now... if we want to save it, it'll be there for now on the user side.

lovehandle · 2013-09-21T19:45:50Z

@louh 👍

migurski · 2013-09-21T20:12:51Z

LocalStorage or cookies are probably a better bet, since they will leave us with a read-only application for lower maintenance burden.

Fuzzy matches are an interesting opportunity for Git/JSON-driven search terms. They can be initially generated automatically, committed to the project, and then further edited via manual updates and such. Things like synonyms and related terms are so human.

louh mentioned this issue Sep 20, 2013

Improve NAICS API to provide better relevant results when it searches codeforamerica/fast_pass#20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve search functionality (goal is to improve how it returns results for lv-dof) #20

Improve search functionality (goal is to improve how it returns results for lv-dof) #20

louh commented Sep 20, 2013

lovehandle commented Sep 21, 2013

louh commented Sep 21, 2013

migurski commented Sep 21, 2013

lovehandle commented Sep 21, 2013

louh commented Sep 21, 2013

lovehandle commented Sep 21, 2013

louh commented Sep 21, 2013

lovehandle commented Sep 21, 2013

migurski commented Sep 21, 2013

Improve search functionality (goal is to improve how it returns results for lv-dof) #20

Improve search functionality (goal is to improve how it returns results for lv-dof) #20

Comments

louh commented Sep 20, 2013

lovehandle commented Sep 21, 2013

louh commented Sep 21, 2013

migurski commented Sep 21, 2013

lovehandle commented Sep 21, 2013

louh commented Sep 21, 2013

lovehandle commented Sep 21, 2013

louh commented Sep 21, 2013

lovehandle commented Sep 21, 2013

migurski commented Sep 21, 2013