Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hibernate-search] Part 1: Remove custom ElasticSearch integration #6209

Merged
merged 178 commits into from
Sep 13, 2024

Conversation

matthias-ronge
Copy link
Collaborator

@matthias-ronge matthias-ronge commented Aug 30, 2024

Following a change in agreement with release management, the entire first part of the development package is provided in a joint pull request. This removes all previous ElasticSearch integration. In this state, the software can be operated without ElasticSearch.

The following functional restrictions apply:

  • Searching for an existing parent process using its PPN during catalog import (and thus identifying existing parent processes) is not possible. (See discussion point 2)
  • Sorting the process list according to the columns "Last editing user", "Start of editing the last task", "Completion of editing the last task" related to the last edited task is not possible. (discussion point 3 /1st - 3rd)
  • Sorting the process list according to "Status of the correction comment" is currently not possible. (discussion point 3 /4th)
  • When manually adding a child process to a parent process in the metadata editor, the search slot no longer compares with the permitted doctypes. (discussion point 4)
  • The filter is hardly supported, only the search by ID or title substring is offered. (discussion point 5)

From FindBugs. SimpleDateFormat class is not thread-safe. Using an
object across threads may result in undefined behavior.
This is done in anticipation of the upcoming renaming of the X packages.
This also considers non-empty strings that consist of white space only
as empty, which, as far as I can see, is the more correct choice in all
places.
Copy link
Member

@solth solth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this updated pull request! I think this represents a much better intermediate work in progress to review than the last pull request.

I also want to thank you for the additional explanations you provided for various details of your pull request.

Concerning the changes themselves, I first of all support the general direction you took here by first removing all ElasticSearch parts from the code to create a preliminary version that only operates with the DB. I very much hope you will be able to succeed in integrating HibernateSearch from here while still supporting all relevant functionalities currently available in Kitodo.Production.

Some specific remarks concerning your work so far:

  • As you pointed out in your description, filters are not working in the current state of this pull request. While trying to integrate HibernateSearch myself some years ago I found mapping all existing Kitodo.Production filters from ElasticSearch to HibernateSearch was the most difficult part (which I didn't finish). Although the HibernateSearch integration itself might be mostly comprised of adding annotations, actually supporting the various required filters will probably prove a lot more challenging. I would like to stress that we agreed on keeping the existing functionality, so this is kind of the base line for me when judging whether the HibernateSearch integration will be successful or not in the end.

  • Concerning the filter functionalities I noticed that you configured some tests to be ignored (should be disabled in JUnit 5), but removed many others. Some of those removed tests are probably really obsolete in the future since we won't manually write to the index or manage index dependencies anymore, but many find... methods (especially those with relevant filter configurations) will still be relevant and required with HibernateSearch. Therefor I would prefer if you handled those tests more consistently and set all tests that aren't supported in this preliminary work in progress state to disabled instead of deleting them and then update and re-enable them again once HibernateSearch is completely available. (If the tests contain classes of the now removed ElasticSearch library, the test method body should be emptied and replaced with a TODO as a reminder that this test has to be reworked for HibernateSearch)

  • I would suggest to update the hibernate-search branch as soon as possible, since quite a few changes have been made to the master branch since the last update. Especially the testing framework has been updated from JUnit 4 to JUnit 5 so all new tests need to be updated with the new, corresponding annotations.

I am happy that you tried to focus on the parts relevant for the HibernateSearch integration and didn't try to rewrite the complete base of the application or introduce other fundamdental changes that have nothing to do with HibernateSearch.

I did find a few pages that don't work but should work. While project, template and process lists work fine with data obtained from the database, opening the task list and extended search triggers errors right now:

Task list:
Bildschirmfoto 2024-09-09 um 09 51 24

Extended search:
Bildschirmfoto 2024-09-09 um 09 51 39

This and the fact that you removed so many tests should be kept in mind when judging the quality of a pull request based on a successful build here on GitHub accompanied by a green check mark.

@matthias-ronge
Copy link
Collaborator Author

I hope I haven't overlooked anything and have processed all your comments to your satisfaction. The task list and the extended search page are now loading as well.

  • It is true that filters are not supported in this pull request, and they will not do in pull request [hibernate-search] Introduce Hibernate Search framework and implement indexing page #6218 either. They will be implemented again in the pull request following after, and I can already tell you that this will be done in a much simpler way than in the old code. The user's input shall be rewritten to an index search query. If then the correct information is in the index, the search should produce the expected results. I am aware that the indexing may also have to be adjusted. The complexity of generating a query across several levels of class inheritance hierarchy is no longer intended. I am aware that the indexing may also have to be adjusted. It should be as simple as possible, just that it does the right thing.
  • I have added the deleted tests back in, and undeleted the removed find... functions as well, and marked them with a TODO. However, I assume that we will be able to delete these later.
  • I could not create an update branch for the hibernate-search branch from the current master until those developments are approved and merged, and since the previous pull requests ([hibernate search] Introduce a common interface for DTO objects and data beans #6032[hibernate-search] Remove database column 'indexAction' #6201) were never merged, I couldn't do an update by now. I an aware that this is pending.

@solth
Copy link
Member

solth commented Sep 13, 2024

@matthias-ronge thanks for incorporating the requested changes. I have no further remarks at this point and think it is advisable to move forward.

@solth solth merged commit ee5fe13 into kitodo:hibernate-search Sep 13, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants