Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic search backend #67 #76

Merged
merged 25 commits into from
Sep 5, 2024
Merged

Add basic search backend #67 #76

merged 25 commits into from
Sep 5, 2024

Conversation

jacobtylerwalls
Copy link
Member

@jacobtylerwalls jacobtylerwalls commented Aug 22, 2024

Add basic search backend at api/search. Closes #67

GET params:

  • term: search term
  • maxEditDistance (fuzziness)
  • page
  • items (per page)
  • exact

All are optional. If you don't provide a search term then the cached hierarchies are rebuilt. A use case for this might be some sort of "preflight" request to warm the cache when the search page loads and before the user interacts with the search input. All of this is up for discussion. The default cache life is 5 minutes and is configurable.

Demo

http://127.0.0.1:8000/api/search?term=oss

Result
[
  {
    "id": "d4e4db86-a948-4d6a-b89f-d1682d038afe",
    "labels": [
      {
        "language": "de",
        "value": "Knochen",
        "valuetype": "prefLabel"
      },
      {
        "language": "pt",
        "value": "osso",
        "valuetype": "prefLabel"
      },
      {
        "language": "en",
        "value": "bone (material)",
        "valuetype": "prefLabel"
      },
      {
        "language": "fr",
        "value": "os",
        "valuetype": "prefLabel"
      }
    ],
    "parents": [
      {
        "id": "b73e741b-46da-496c-8960-55cc1007bec4",
        "labels": [
          {
            "language": "en-US",
            "value": "AAT Entries",
            "valuetype": "prefLabel"
          }
        ]
      },
      {
        "id": "7764512c-494b-46e5-ad33-223836c8518b",
        "labels": [
          {
            "language": "en-US",
            "value": "Materials",
            "valuetype": "prefLabel"
          }
        ]
      }
    ],
    "polyhierarchical": false
  },
  {
    "id": "07dbe013-7dcf-4dd7-9df1-e72a9a855da5",
    "labels": [
      {
        "language": "fr",
        "value": "bois",
        "valuetype": "prefLabel"
      },
      {
        "language": "en",
        "value": "wood (plant material)",
        "valuetype": "prefLabel"
      }
    ],
    "parents": [
      {
        "id": "b73e741b-46da-496c-8960-55cc1007bec4",
        "labels": [
          {
            "language": "en-US",
            "value": "AAT Entries",
            "valuetype": "prefLabel"
          }
        ]
      },
      {
        "id": "7764512c-494b-46e5-ad33-223836c8518b",
        "labels": [
          {
            "language": "en-US",
            "value": "Materials",
            "valuetype": "prefLabel"
          }
        ]
      }
    ],
    "polyhierarchical": false
  }
]

Testing instructions

  1. Load some data following instructions at Improve RDM to Lingo migration #74
  2. Checkout this branch, run migration
  3. Make some requests
  4. You can compare the search results to the haystack by viewing api/concept_trees

TODO:

  • Add python tests

@jacobtylerwalls jacobtylerwalls linked an issue Aug 22, 2024 that may be closed by this pull request
Base automatically changed from 68_improve_rdm_to_lingo_migration to main August 26, 2024 19:05
@jacobtylerwalls jacobtylerwalls force-pushed the jtw/search-backend branch 2 times, most recently from f652cb6 to 5c48d38 Compare August 27, 2024 19:00
Copy link
Contributor

@chrabyrd chrabyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good, and I appreciate the amount of work that's gone into powering this (new && different) search 👍

Mostly minor stuff, but happy to touch base about what I'm imagining is a too-fuzzy search 😁

arches_lingo/views/__init__.py Outdated Show resolved Hide resolved
arches_lingo/views/trees.py Outdated Show resolved Hide resolved
arches_lingo/views/trees.py Outdated Show resolved Hide resolved
]

# Todo: filter by nodegroup permissions
return JSONResponse(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been altered in #79 to return more data about the query

arches_lingo/views/trees.py Outdated Show resolved Hide resolved
arches_lingo/views/trees.py Outdated Show resolved Hide resolved
Copy link
Contributor

@chrabyrd chrabyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good and works well! just some minor code style stuff really

arches_lingo/concepts.py Outdated Show resolved Hide resolved
arches_lingo/query_utils.py Outdated Show resolved Hide resolved
arches_lingo/querysets.py Outdated Show resolved Hide resolved
arches_lingo/settings.py Show resolved Hide resolved
arches_lingo/urls.py Show resolved Hide resolved
arches_lingo/views/api/concepts.py Outdated Show resolved Hide resolved
page = paginator.get_page(page_number)

data = []
if page:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: instead of control flow based on presence/absence of a pagintor page, can it based on the data fed into the paginator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ConceptBuilder does queries to build a tree (or gets the cached one). If there are no search results, there's no need to build a tree, we can just quickly return empty. That was my thinking. I'm not exactly sure what you're suggesting as an alternative. I can clarify this to deindent a little bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, I've not put a breakpoint to see if this holds up but

concept_query = concept_query.values_list("concept_id", flat=True).distinct()

returns something that will either have a length or not?

Since page_number looks like it will always have a value, the only time there shouldn't be a page is when there's not data. Admittedly this is my assumption.

So instead of if page:, can it be if len(concept_query): ( or whatever its new name is )?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can signpost better. But we can't evaluate the queryset too early or we'll unnecessarily fetch lots of objects before doing the pagination to slim down the results.

Copy link
Member Author

@jacobtylerwalls jacobtylerwalls Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the implicit booleanness of the Page object is too opaque, I'll change it to if not len(page) to clarify it's an iterable of objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it saves one query in the no-data path to check page.count, which seems like something I might ask on the Django forum about because len() should work the same way. So I'll do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this should be .count on the paginator, not the page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arches_lingo/views/root.py Outdated Show resolved Hide resolved
page = paginator.get_page(page_number)

data = []
if page:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, I've not put a breakpoint to see if this holds up but

concept_query = concept_query.values_list("concept_id", flat=True).distinct()

returns something that will either have a length or not?

Since page_number looks like it will always have a value, the only time there shouldn't be a page is when there's not data. Admittedly this is my assumption.

So instead of if page:, can it be if len(concept_query): ( or whatever its new name is )?

arches_lingo/concepts.py Outdated Show resolved Hide resolved
concept_query = VwLabelValue.objects.all().order_by("concept_id")
concept_ids = concept_query.values_list("concept_id", flat=True).distinct()

paginator = Paginator(concept_ids, items_per_page)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way, this code gets reduced to something like:

        if not concept_ids:
            # No results: don't bother building the concept tree.
            return JSONResponse([])

        paginator = Paginator(concept_ids, items_per_page)
        page = paginator.get_page(page_number)

        builder = ConceptBuilder()

        return JSONResponse([
            builder.serialize_concept(str(concept_uuid), parents=True, children=False)
            for concept_uuid in page
        ])

Copy link
Member Author

@jacobtylerwalls jacobtylerwalls Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as I mentioned above, it would be a pessimization to evaluate the queryset before the paginator slices it. We need to delegate that to the paginator.

Copy link
Contributor

@chrabyrd chrabyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👍

@jacobtylerwalls jacobtylerwalls merged commit 636dbf5 into main Sep 5, 2024
3 checks passed
@jacobtylerwalls
Copy link
Member Author

Thanks for the review!

@jacobtylerwalls jacobtylerwalls deleted the jtw/search-backend branch September 5, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement basic search page backend
2 participants