[hibernate search] Introduce a common interface for DTO objects and data beans #6032

matthias-ronge · 2024-04-12T09:19:19Z

Issue #5760 1c) -- part 1

Follow-up pull request to #5784 (immediate diff)

From FindBugs. SimpleDateFormat class is not thread-safe. Using an object across threads may result in undefined behavior.

henning-gerhardt

This is a code reading review only. I did not the the functionality itself nor I'm discussing the system architecture changes.

Kitodo-DataManagement/src/main/java/org/kitodo/data/database/beans/Process.java

Kitodo-DataManagement/src/main/java/org/kitodo/data/interfaces/WorkflowInterface.java

Kitodo/src/main/java/org/kitodo/production/services/data/TaskService.java

Kitodo/src/test/java/org/kitodo/selenium/LoginST.java

This is done in anticipation of the upcoming renaming of the X packages.

This also considers non-empty strings that consist of white space only as empty, which, as far as I can see, is the more correct choice in all places.

Kitodo-DataManagement/src/main/java/org/kitodo/data/interfaces/WorkflowInterface.java

Kitodo/src/main/java/org/kitodo/production/converter/IllustratedSelectItemConverter.java

Kitodo/src/test/java/org/kitodo/production/services/data/TemplateServiceIT.java

oliver-stoehr

The code style looks good, the amount of comments is very nice.

Can you explain the purpose of this common interface? I've looked into HibernateSearch, but not in depth, yet. My understanding was, that one of the main advantages is, that HibernateSearch can work directly with the beans and does not need any extra layer. So my hope was, that we could remove the DTO objects without a replacement.
Adding a new interface seems to me like just "shifting" the amount and complexity from our existing DTO objects to the new interface. But since I did not fully understand how HibernateSerdach works, I might be missing something essential!

matthias-ronge · 2024-06-07T10:58:21Z

Yes, the DTOs shall be deleted, and many more files will be deleted later, but to reduce the pull requests’ size, I didn’t delete them yet. There will be deleted in a separate pull request.
Yes, it is possible to delete the interface completely then, which would mean to move the Javadoc comments to the beans and remove the interface. This is a decision of our own taste as developers. First, introducing the interface was helpful for me to see and get the beans and DTOs really work equally (they did for 96%, but there were inconsitencies). So if there will be a majority for removing the interface later, I can do that. But it should happen after the other removals have happened.

henning-gerhardt · 2024-06-07T11:11:34Z

I don't think that is a good idea to access the Hibernate data directly in the UI - from a security view. There was many exploits in the used UI frameworks, bad written UI code and other mistakes which made it possible to access the database through the used ORM like Hibernate and UI to access (read / write / delete) the database data. From this point I would not recommend the way to expose the Hibernate objects to the UI and use a wrapping mechanism (like used POJO's / DTO's right now) to made the database data available in the UI. So removing them is a bad idea from my view of security to web applications.

matthias-ronge · 2024-06-07T12:27:13Z

You were right about what you say, but unluckily, this happens anyway in Kitodo.Production, and the DTO objects here have a different meaning. They were the database objects, when loaded from index, for filling the tables. Removing the index objects when deleting ist straight foreward.

Kitodo-DataManagement/src/main/java/org/kitodo/data/interfaces/BatchInterface.java

stweil

Ich schreibe deutsch, weil vermutlich an der Diskussion keine englischsprachigen TN beteiligt sind und weil ich glaube, dass es so für alle einfacher ist.

Mit 22 Commits ist dieser PR ziemlich umfangreich und nicht so einfach zu prüfen. Mein erster Eindruck ist, dass etliche Commits zusammengefasst werden sollten.

Eine Grundanforderung ist, dass jeder einzelne Commit funktionierenden Code als Ergebnis haben muss. Wird dies nicht eingehalten, so macht das beispielsweise ein git bisect unnötig schwer. Ein Commit mit Änderungen und ein weiterer Commit, der das nachträglich korrigiert, sollten also zusammengefasst werden.

Die gewünschte Funktionalität – Umstellung auf Hibernatesearch – ist auf mehrere Pull Requests verteilt. Das ist in Ordnung. Aber ich würde gerne sehen, was letztlich das Endergebnis sein wird. Dafür benötige ich einen Branch oder einen Draft Pull Request, der zeigt, wie dieses Endergebnis aussieht und mit dem ich lokal die komplette neue Funktionalität ausprobieren kann. Ist das bereits möglich, und ich habe nur übersehen, wie es geht?

matthias-ronge · 2024-07-19T10:44:47Z

@stweil Generell stimme ich deiner Meinung zu. Allerdings ist diese Entwicklung zu groß, um Commits nur in dem Moment durchzuführen, in dem der Code funktioniert. Zum Review schlage ich vor, dass du die Commits ignorierst und nur den resultierenden Code gegenliest. Vielleicht ist jeder der Pull Requests für die Hibernate-Search-Entwicklung, so wie du es denkst, „ein Commit“.

Es sollte möglich sein, den Branch auszuchecken und auszuführen. Jeder Pull Request löst eine bestimmte Aufgabe, dieser hier führt „nur“ die Schnittstellenschicht ein. Es sollten also beim Ausführen des Codes keine funktionalen Unterschiede zu erkennen sein.

solth · 2024-08-01T12:38:19Z

I understand @henning-gerhardt's concern but agree with @matthias-ronge that the DTOs in Kitodo.Production never really offered any additional security. If I understand the concept of DTOs correctly they are actually not really intended as a security factor anyway, but rather to keep a low data profile when transfering objects between a client and a server in a network is expensive. That is how they are described on Wikipedia and by their original designer Martin Fowler here, at least.

Wikipedia, for example, states that "This pattern is often incorrectly used outside of remote interfaces." Martin Fowler writes "DTOs are called Data Transfer Objects because their whole purpose is to shift data in expensive remote calls." and further "Not just do you not need them in a local context, they are actually harmful both because a coarse-grained API is more difficult to use and because you have to do all the work moving data from your domain or data source layer into the DTOs." (the article linked above is old, but a good read and contains more interesting remarks about the concept of data transfer objects!) I therefor think that DTOs were not correctly used in Kitodo.Production to begin with, so getting rid of them is a good first step to reduce unnecessary complexity, IMHO.

I also agree with @oliver-stoehr that replacing each DTO with a corresponding new interface should be avoided and just shifts the code overhead to a differently named class. Instead, we should take this opportunity to create a cleaner implementation and take advantage of HibernateSearch's ability to directly manage index and database objects at the same time. Since @matthias-ronge already mentionend that removing the interfaces is an option for a follow-up pull request I can accept introducing them temporarily, even though I am not convinced it is necessary or the best approach. I think it might complicate the implementation of the next pull requests in the HibernateSearch project, but in the end that is up to Matthias.

One little remark concerning these individual pull requests against the hibernate-search branch: for these the reviews should focus on the larger concept of the development and not on code style details like method length, missing java docs etc. These more formal criteria should be checked when the final pull request from the hibernate-search branch back against the master branch is opened.

solth

I have a hard time seeing the point of these new data interfaces. After inspecting the files I am not convinced they provide any new value, but just unnecessarily increase the complexity. Until the DTO classes are removed, the data interfaces might be implemented by two classes, each, but in my opinion the introduction of the interfaces is redundant and not a good idea.

Interfaces in Java should be used to give developers a choice which implementation of the interface they choose - for example java.util.ArrayList or java.util.LinkedList as specific implmentations of the interface java.util.List - when passing parameters to a method that specifies an interface as a parameter.

In this case (Kitodo) however, once the DTOs are removed, a developer will always want to instantiate for example just a Process, not a ProcessInterface. The individual entities in Kitido.Production, like Projects, Templates, Processes, Tasks etc. are clearly defined. There won't be alternative implementations for these concepts, so in my opinion it doesn't make sense to introduce these more general interfaces here which would suggest alternative implementations are desirable.

Since these preliminary reviews are not supposed to check the code style or implementation details, but rather to determine the legitimacy of fundamental concept, I think it has to be a valid outcome of the review to reject the proposed solution, if the reviewer believes it goes in the wrong direction.

I would be willing to approve these changes under the clear condition that they are just a temporary solution and will be removed together with the DTO classes in a follow-up pull request.

I would be very interested to read what the other developers think about this approach. @henning-gerhardt , @thomaslow , @Erikmitk , @oliver-stoehr , @BartChris , what is your opinion on this?

henning-gerhardt · 2024-08-02T09:24:39Z

I agree with the @solth opinion regarding to introducing of the new interfaces. I see even no benefits or advantages to introduce them at this point as I did not see any different implementations of this classes in the future as they only used internal / inside the core application itself. If I missed the point than I need at least an explanation why they are really needed.

thomaslow · 2024-08-05T08:11:28Z

I quickly skipped over the proposed changes. I agree that introducing interface classes that have a 1:1 relationship to their only subclasses is not really necessary. Usually, I like interfaces. In this case, however, the contract between the business logic and the database of Kitodo.Production is already achieved through the ORM model with hibernate beans. In my opinion, there are no additional interface classes necessary.

In the case of DTO classes, I partially agree with @henning-gerhardt. There definitely should be DTO classes. However, the existing DTO classes are not relevant anymore, since they were used between ElasticSearch and the business logic. I believe, this part (choosing the fields that are being indexed and converted to JSON) can now be controlled via Hibernate-Search. Based on the @Field(index=Index.YES/NO, store=Store.YES/NO) annotation, Hibernate-Search will most certainly generate a similar JSON document as before, which provides the same security as before, meaning sensitive information (like passwords) are not leaked to the search engine. Therefore, the existing DTO classes may be removed.

Instead, someday, in a galaxy far far away..., there should be DTO classes between the business logic and the user interface, such that it would not be possible anymore to trigger numerous ORM-based SQL queries from the user interface and potentially build a REST API that can be used to move Kitodo towards the current generation of web technologies. However, this has nothing to do with the migration to Hibernate-Search and should not be part of this pull request.

Erikmitk · 2024-08-07T10:17:50Z

I don't have anything to add. I agree with the comments made by @henning-gerhardt, @thomaslow and @solth that additional interfaces make no sense in this context. :)

matthias-ronge · 2024-08-13T14:26:07Z

There are two groups of classes, database objects and index objects. (Here, I have the impression that we think the same thing, they are called DTOs but the name is not the concept behind DTO, so here it is just called that.)

The introduction of the interface is aimed at standardizing the "almost"—but not everywhere exactly—equal classes of objects from the database and objects from the index so that they become interchangeable, and is documentation of what the methods must return. This was also necessary in order to be able to achieve the alignment.

It is a basic tool of refactoring to first encapsulate the section to be newly implemented in an interface, then you can use it based on the interface and reimplement it. This is what happened here.

After the work has been done successfully, there is the option of keeping the interface or removing it. This is undecided here, I can be happy with both.

solth · 2024-08-22T13:48:17Z

@matthias-ronge I understand what the intention of these new data interfaces is, but they are unecessary in the long run, as discussed above. When the DTO classes are removed, the interfaces are obsolete and should be removed. Since all reviewers seem to agree on this, I would say this is very much decided.

This means they can only be temporary addition. Therefore I think first opening a large pull request introducing a temporary construct and later opening more pull requests that remove this construct again is not very helpful and just multiplies the workload for you to maintain the individual branches and PRs and for the reviewers to perform uncessary reviews.

We did agree on splitting all changes of the hibernate search implementation into multiple pull requests to reduce the size of each individual PR, but the individual pull requests should all just introduce parts that are required for the final implementation and not contain large refactorings of the architecture that are later reverted.

Additionally, I don't think it makes sense to concentrate on fixing all tests for intermediate development states. Builds for pull requests against the hibernate-search branch do not have to pass all checks, in my opinon. Those checks are only necessary to pass once the final pull request against the master is openend.

matthias-ronge · 2024-08-28T12:05:12Z

In the early days of 3.x development, we placed a lot of emphasis on mapping all functionality using interfaces. When we were removing the legacy UGH library, I was even explicitly asked to represent the entire interface of the library as a graphic, which took me eight hours.

I observe that the strategy for dealing with interfaces is changing. I know that there are reasons for and against them, and it doesn't bother me if we discard them in the end. However, at this point in development, they were necessary because: The beans and the DTO objects were largely congruent, but they differed in some details.

the DTO objects returned strings or integers in some places, or wanted them as setters, where the beans return database objects
the DTO objects returned true or false (present or null/empty) in some places, where the beans contain database objects
the DTO objects returned information that was stored in the database in a different way.

With the introduction of the interfaces, I smoothed out these points and made the objects 100 percent interchangeable, which is necessary for the existing functionality to continue to work. This involved a lot of research to map the functionality properly. Because of the extensive changes, it was only possible for me to manage the brain load in this way; it's not just technical.

I don't think it makes sense to concentrate on fixing all tests for intermediate development states.

I took a pragmatic approach here and analyzed tests that failed. In some cases, these led me to points where the functionality behaved differently than expected or described. So they did what they were supposed to and I was able to correct errors. However, there are also tests that are no longer useful in the change if they tested functionality that was implemented but never used outside of the tests. I deleted these tests. (Less here, but also in the following pull requests.) Thirdly, I temporarily @Ignored a number of tests because they cannot work without an available index because they require index fields that are not currently available. However, these should work again at the end of development.

In principle, I don't think it's wrong if the tests that should work also do pass. If we should deviate from this, I would need to know because I assume that it is a condition for the review. However, I think that this could lead to a pile of unseen errors that then have to be resolved later. I think that this will give us a trustworthy result in the end, which is probably what everyone wants.

matthias-ronge · 2024-08-30T12:13:51Z

By recent agreement with Release Management, the provision of the code for the entire first part of the development has now been made into one joint pull request #6209. The individual parts pull requests will be closed.

matthias-ronge added 5 commits April 10, 2024 16:11

Introduce an interface for the DTO objects

177c25b

Use 'extends' in interface lists

6b617e4

Fix type clashes

094e528

Move image binary getter out of DTO object

59b47a8

Fix more type clashes

580f8e1

matthias-ronge force-pushed the 5760_1c_1 branch 3 times, most recently from cf7bb35 to bbf9432 Compare April 12, 2024 09:41

Use the interface in the data beans.

f39dab0

matthias-ronge force-pushed the 5760_1c_1 branch from bbf9432 to f39dab0 Compare April 12, 2024 09:42

matthias-ronge added 10 commits April 12, 2024 12:01

Fix checkstyle

d2a4bd1

No longer use SimpleDateFormat from a constant

c5cbf18

From FindBugs. SimpleDateFormat class is not thread-safe. Using an object across threads may result in undefined behavior.

Fix Exception 'null' when retrieving int value for key numberOfImages

5c00c6e

Undo unnecessary changes

e6dc1a5

Rename active templates property accordingly

35e29c0

Fix enum comparison

645277a

Fix LazyInitializationException

e793f1e

Fix several occurrences of PropertyNotWritableException

9fccf2c

Fix Javadoc

f42fa94

Fix displayed wrong task's processing begin

22e8263

matthias-ronge marked this pull request as ready for review April 23, 2024 13:13

This was referenced Apr 23, 2024

Introduce interfaces for the database- and search-related functions of the services #6050

Closed

[hibernate search] Introduce interfaces for the database- and search-related functions of the services #6051

Closed

solth requested review from solth and oliver-stoehr May 3, 2024 13:03

henning-gerhardt suggested changes May 17, 2024

View reviewed changes

matthias-ronge added 4 commits May 21, 2024 09:37

No longer import PropertyNotWritableException

bff6815

This is done in anticipation of the upcoming renaming of the X packages.

Remove an unused import

2cc1890

Use apache.commons.lang3.StringUtils instead of log4j.util.Strings

797f7a4

Use is(Not)Blank instead of is(Not)Empty

77ba25a

This also considers non-empty strings that consist of white space only as empty, which, as far as I can see, is the more correct choice in all places.

matthias-ronge requested a review from henning-gerhardt May 21, 2024 08:42

matthias-ronge force-pushed the 5760_1c_1 branch from 726bd3f to 83affae Compare May 21, 2024 09:23

No longer use '...Interface' as part of variable names

14dbdc8

matthias-ronge force-pushed the 5760_1c_1 branch from 83affae to 14dbdc8 Compare May 21, 2024 09:32

henning-gerhardt reviewed May 21, 2024

View reviewed changes

Improve reason messages in tests

290b4c5

oliver-stoehr reviewed Jun 3, 2024

View reviewed changes

solth requested a review from stweil July 18, 2024 08:23

stweil reviewed Jul 18, 2024

View reviewed changes

Kitodo-DataManagement/src/main/java/org/kitodo/data/interfaces/BatchInterface.java Show resolved Hide resolved

stweil suggested changes Jul 18, 2024

View reviewed changes

solth reviewed Aug 2, 2024

View reviewed changes

matthias-ronge closed this Aug 30, 2024

matthias-ronge mentioned this pull request Sep 11, 2024

[hibernate-search] Part 1: Remove custom ElasticSearch integration #6209

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hibernate search] Introduce a common interface for DTO objects and data beans #6032

[hibernate search] Introduce a common interface for DTO objects and data beans #6032

matthias-ronge commented Apr 12, 2024

henning-gerhardt left a comment

oliver-stoehr left a comment

matthias-ronge commented Jun 7, 2024

henning-gerhardt commented Jun 7, 2024

matthias-ronge commented Jun 7, 2024

stweil left a comment

matthias-ronge commented Jul 19, 2024

solth commented Aug 1, 2024 •

edited

Loading

solth left a comment •

edited

Loading

henning-gerhardt commented Aug 2, 2024

thomaslow commented Aug 5, 2024 •

edited

Loading

Erikmitk commented Aug 7, 2024 •

edited

Loading

matthias-ronge commented Aug 13, 2024

solth commented Aug 22, 2024

matthias-ronge commented Aug 28, 2024

matthias-ronge commented Aug 30, 2024

[hibernate search] Introduce a common interface for DTO objects and data beans #6032

[hibernate search] Introduce a common interface for DTO objects and data beans #6032

Conversation

matthias-ronge commented Apr 12, 2024

henning-gerhardt left a comment

Choose a reason for hiding this comment

oliver-stoehr left a comment

Choose a reason for hiding this comment

matthias-ronge commented Jun 7, 2024

henning-gerhardt commented Jun 7, 2024

matthias-ronge commented Jun 7, 2024

stweil left a comment

Choose a reason for hiding this comment

matthias-ronge commented Jul 19, 2024

solth commented Aug 1, 2024 • edited Loading

solth left a comment • edited Loading

Choose a reason for hiding this comment

henning-gerhardt commented Aug 2, 2024

thomaslow commented Aug 5, 2024 • edited Loading

Erikmitk commented Aug 7, 2024 • edited Loading

matthias-ronge commented Aug 13, 2024

solth commented Aug 22, 2024

matthias-ronge commented Aug 28, 2024

matthias-ronge commented Aug 30, 2024

solth commented Aug 1, 2024 •

edited

Loading

solth left a comment •

edited

Loading

thomaslow commented Aug 5, 2024 •

edited

Loading

Erikmitk commented Aug 7, 2024 •

edited

Loading