Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate entity search to Haystack #711

Closed
wants to merge 41 commits into from
Closed

Migrate entity search to Haystack #711

wants to merge 41 commits into from

Conversation

hancush
Copy link

@hancush hancush commented Jan 15, 2021

Overview

This PR:

  • Replaces the custom Solr search back end that powers entity search with a conventional implementation of Haystack. N.b., it does not migrate the logic to populate and query the composition documents to form the org charts from Use Solr to build org charts #416.
  • Updates the startup instructions in the README and makes some small adjustments to the Docker setup.

Connects #674.

Demo

Screen Shot 2021-01-26 at 1 59 35 PM
Screen Shot 2021-01-26 at 2 00 15 PM
Screen Shot 2021-01-26 at 2 00 39 PM

Notes

There are a few classes of changes here:

  • Adds ${ENTITY_MODULE}.search_indexes for each search entity: person, unit, violation, and source.
  • Updates schema.xml to include Haystack fields. N.b., the template into which these fields are generated lives in templates/search_configs/schema.xml and is a lightly edited version of the original schema.
  • Replaces the custom search view with a Haystack FacetedSearchView and adjusts the search templates to work with the new context.
  • Renames make_search_index to update_composition_index and removes code to add documents related to the base search entities.
  • Updates the encrypted configs to include HAYSTACK_* variables (as exemplified in local_settings.example.py).
  • Updates the tests to remove tests for index updates on model changes. (We'll be using Haystack's RealtimeSignalProcessor for this behavior, i.e., we do not need to test third-party code.)

Testing Instructions

  • If you do not have a local instance, follow the updated README instructions to retrieve a dump of the staging database and load it into your database.
  • Build the entity search index: docker-compose run --rm app python manage.py update_index -v 3
    • Note that this will take about half an hour.
  • Build the composition search index: docker-compose run --rm app python manage.py make_search_index
  • Create a superuser and log in so you can access source search: docker-compose run --rm app python manage.py createsuperuser
  • Navigate to your local homepage and begin to execute a variety of searches. Confirm that load times are reasonable, and that the results generally match those on https://back.securityforcemonitor.org.
  • Continue to make test searches from the search results page. Change sorting and paging parameters, and confirm the expected results resemble staging.
  • Use the personnel search to find a commander. Navigate to their detail page and confirm the organization charts render as expected.
  • Use unit search to find a unit with at least one parent. Navigate to its detail page and confirm it renders as expected.

solr_configs/conf/schema.xml Outdated Show resolved Hide resolved
solr_configs/conf/schema.xml Outdated Show resolved Hide resolved
solr_configs/conf/schema.xml Outdated Show resolved Hide resolved
solr_configs/conf/schema.xml Outdated Show resolved Hide resolved
sfm_pc/views.py Outdated Show resolved Hide resolved
sfm_pc/views.py Outdated Show resolved Hide resolved
@hancush hancush marked this pull request as ready for review January 26, 2021 20:09
@hancush hancush requested a review from fgregg January 27, 2021 17:03
Copy link
Collaborator

@fgregg fgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better! what a great improvement. let's take the opportunity to address some smaller perplexities

Dockerfile-solr-jts Show resolved Hide resolved
@@ -28,7 +28,8 @@ update_db : import_directory import_db auth_models.json flush_db link_locations
python manage.py loaddata auth_models.json
python manage.py make_materialized_views --recreate
python manage.py update_countries_plus
python manage.py make_search_index --recreate
python manage.py rebuild_index --noinput
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a command to build the schema?

also, it would seem like these search index commands maybe belong under a different PHONY target?

@@ -75,7 +83,8 @@ Build and start the Docker image for the Solr server:

Open up another shell and create the search index:

docker-compose run --rm app ./manage.py make_search_index
docker-compose run --rm app ./manage.py update_index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do not have to do address this in this PR, but we should use the makefile commands here, imo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the application of this Makefile is, to be honest. There's one reference to the import_google_docs target in the README, but I don't see other references to the recipes in the code base (including deployment scripts). I think I'd like to update the Makefile in our import refactor so it has some useful targets for local development and eventual deployment on Heroku. I'll plan to make those changes in a separate PR.


class OrganizationIndex(SearchEntity, indexes.Indexable):

'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this comment doing for us?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I included these as a reference for what needs to go into each index. I think it's safe to remove!

countries = set()

for division in prepared_data['division_ids']:
countries.update([country_name(division)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand this is copy paste, but this seems like it should be

countries.add(country_name(division))

or even better

countries = {country_name(division) for division in prepared_data['division_ids']}

countries = set()

for division in prepared_data['division_ids']:
countries.update([country_name(division)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another weird use of update

person_division_id = object.division_id.get_value()

if person_division_id:
division_ids.update([person_division_id.value])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another weird use of update

org_division_id = org.division_id.get_value()

if org_division_id:
division_ids.update([org_division_id.value])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another weird use of update


memberships = [mem.object_ref for mem in object.memberships]

if memberships:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this conditional

name = organization.name.get_value()

parents = organization.parent_organization.all()
published = all([p.value.published for p in parents])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be all(p.value...), don't need the list comprehension

@hancush
Copy link
Author

hancush commented Jan 27, 2021

Okey doke, @fgregg, I think I responded to and/or addressed all of your comments. Do you want to take another look?

@hancush
Copy link
Author

hancush commented Jan 28, 2021

TODO:

  • Retain filters when executing keyword search (currently removes when submitting form)
  • Investigate poor results (see below)

if I search "22 Batallón de Infantería" with a filter set for "Units" and "Mexico" I expect it would narrow down the results, but it doesn't do that:

https://back.securityforcemonitor.org/en/search/?q=22%20Batall%C3%B3n%20de%20Infanter%C3%ADa&end_date=&start_date=&entity_type=Organization&selected_facets=countries_exact%3AMexico

It returns 1006 results, and "22 Batallón de Infantería" does not appear at or near the top of those results.

@hancush hancush changed the base branch from master to staging February 23, 2021 21:47
@hancush
Copy link
Author

hancush commented Mar 5, 2021

Superseded by #746 (rebase, plus additional work to patch bugs).

@hancush hancush closed this Mar 5, 2021
@hancush hancush deleted the hcg/haystack branch March 16, 2021 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants