Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hibernate search] Introduce a common interface for DTO objects and data beans #6032

Closed

Conversation

matthias-ronge
Copy link
Collaborator

Issue #5760 1c) -- part 1

Follow-up pull request to #5784 (immediate diff)

@matthias-ronge matthias-ronge force-pushed the 5760_1c_1 branch 3 times, most recently from cf7bb35 to bbf9432 Compare April 12, 2024 09:41
Copy link
Collaborator

@henning-gerhardt henning-gerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a code reading review only. I did not the the functionality itself nor I'm discussing the system architecture changes.

This is done in anticipation of the upcoming renaming of the X packages.
This also considers non-empty strings that consist of white space only
as empty, which, as far as I can see, is the more correct choice in all
places.
Copy link
Collaborator

@oliver-stoehr oliver-stoehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code style looks good, the amount of comments is very nice.

Can you explain the purpose of this common interface? I've looked into HibernateSearch, but not in depth, yet. My understanding was, that one of the main advantages is, that HibernateSearch can work directly with the beans and does not need any extra layer. So my hope was, that we could remove the DTO objects without a replacement.
Adding a new interface seems to me like just "shifting" the amount and complexity from our existing DTO objects to the new interface. But since I did not fully understand how HibernateSerdach works, I might be missing something essential!

@matthias-ronge
Copy link
Collaborator Author

Yes, the DTOs shall be deleted, and many more files will be deleted later, but to reduce the pull requests’ size, I didn’t delete them yet. There will be deleted in a separate pull request.
Yes, it is possible to delete the interface completely then, which would mean to move the Javadoc comments to the beans and remove the interface. This is a decision of our own taste as developers. First, introducing the interface was helpful for me to see and get the beans and DTOs really work equally (they did for 96%, but there were inconsitencies). So if there will be a majority for removing the interface later, I can do that. But it should happen after the other removals have happened.

@henning-gerhardt
Copy link
Collaborator

I don't think that is a good idea to access the Hibernate data directly in the UI - from a security view. There was many exploits in the used UI frameworks, bad written UI code and other mistakes which made it possible to access the database through the used ORM like Hibernate and UI to access (read / write / delete) the database data. From this point I would not recommend the way to expose the Hibernate objects to the UI and use a wrapping mechanism (like used POJO's / DTO's right now) to made the database data available in the UI. So removing them is a bad idea from my view of security to web applications.

@matthias-ronge
Copy link
Collaborator Author

You were right about what you say, but unluckily, this happens anyway in Kitodo.Production, and the DTO objects here have a different meaning. They were the database objects, when loaded from index, for filling the tables. Removing the index objects when deleting ist straight foreward.

@solth solth requested a review from stweil July 18, 2024 08:23
Copy link
Member

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ich schreibe deutsch, weil vermutlich an der Diskussion keine englischsprachigen TN beteiligt sind und weil ich glaube, dass es so für alle einfacher ist.

Mit 22 Commits ist dieser PR ziemlich umfangreich und nicht so einfach zu prüfen. Mein erster Eindruck ist, dass etliche Commits zusammengefasst werden sollten.

Eine Grundanforderung ist, dass jeder einzelne Commit funktionierenden Code als Ergebnis haben muss. Wird dies nicht eingehalten, so macht das beispielsweise ein git bisect unnötig schwer. Ein Commit mit Änderungen und ein weiterer Commit, der das nachträglich korrigiert, sollten also zusammengefasst werden.

Die gewünschte Funktionalität – Umstellung auf Hibernatesearch – ist auf mehrere Pull Requests verteilt. Das ist in Ordnung. Aber ich würde gerne sehen, was letztlich das Endergebnis sein wird. Dafür benötige ich einen Branch oder einen Draft Pull Request, der zeigt, wie dieses Endergebnis aussieht und mit dem ich lokal die komplette neue Funktionalität ausprobieren kann. Ist das bereits möglich, und ich habe nur übersehen, wie es geht?

@matthias-ronge
Copy link
Collaborator Author

@stweil Generell stimme ich deiner Meinung zu. Allerdings ist diese Entwicklung zu groß, um Commits nur in dem Moment durchzuführen, in dem der Code funktioniert. Zum Review schlage ich vor, dass du die Commits ignorierst und nur den resultierenden Code gegenliest. Vielleicht ist jeder der Pull Requests für die Hibernate-Search-Entwicklung, so wie du es denkst, „ein Commit“.

Es sollte möglich sein, den Branch auszuchecken und auszuführen. Jeder Pull Request löst eine bestimmte Aufgabe, dieser hier führt „nur“ die Schnittstellenschicht ein. Es sollten also beim Ausführen des Codes keine funktionalen Unterschiede zu erkennen sein.

@solth
Copy link
Member

solth commented Aug 1, 2024

I understand @henning-gerhardt's concern but agree with @matthias-ronge that the DTOs in Kitodo.Production never really offered any additional security. If I understand the concept of DTOs correctly they are actually not really intended as a security factor anyway, but rather to keep a low data profile when transfering objects between a client and a server in a network is expensive. That is how they are described on Wikipedia and by their original designer Martin Fowler here, at least.

Wikipedia, for example, states that "This pattern is often incorrectly used outside of remote interfaces." Martin Fowler writes "DTOs are called Data Transfer Objects because their whole purpose is to shift data in expensive remote calls." and further "Not just do you not need them in a local context, they are actually harmful both because a coarse-grained API is more difficult to use and because you have to do all the work moving data from your domain or data source layer into the DTOs." (the article linked above is old, but a good read and contains more interesting remarks about the concept of data transfer objects!) I therefor think that DTOs were not correctly used in Kitodo.Production to begin with, so getting rid of them is a good first step to reduce unnecessary complexity, IMHO.

I also agree with @oliver-stoehr that replacing each DTO with a corresponding new interface should be avoided and just shifts the code overhead to a differently named class. Instead, we should take this opportunity to create a cleaner implementation and take advantage of HibernateSearch's ability to directly manage index and database objects at the same time. Since @matthias-ronge already mentionend that removing the interfaces is an option for a follow-up pull request I can accept introducing them temporarily, even though I am not convinced it is necessary or the best approach. I think it might complicate the implementation of the next pull requests in the HibernateSearch project, but in the end that is up to Matthias.

One little remark concerning these individual pull requests against the hibernate-search branch: for these the reviews should focus on the larger concept of the development and not on code style details like method length, missing java docs etc. These more formal criteria should be checked when the final pull request from the hibernate-search branch back against the master branch is opened.

Copy link
Member

@solth solth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time seeing the point of these new data interfaces. After inspecting the files I am not convinced they provide any new value, but just unnecessarily increase the complexity. Until the DTO classes are removed, the data interfaces might be implemented by two classes, each, but in my opinion the introduction of the interfaces is redundant and not a good idea.

Interfaces in Java should be used to give developers a choice which implementation of the interface they choose - for example java.util.ArrayList or java.util.LinkedList as specific implmentations of the interface java.util.List - when passing parameters to a method that specifies an interface as a parameter.

In this case (Kitodo) however, once the DTOs are removed, a developer will always want to instantiate for example just a Process, not a ProcessInterface. The individual entities in Kitido.Production, like Projects, Templates, Processes, Tasks etc. are clearly defined. There won't be alternative implementations for these concepts, so in my opinion it doesn't make sense to introduce these more general interfaces here which would suggest alternative implementations are desirable.

Since these preliminary reviews are not supposed to check the code style or implementation details, but rather to determine the legitimacy of fundamental concept, I think it has to be a valid outcome of the review to reject the proposed solution, if the reviewer believes it goes in the wrong direction.

I would be willing to approve these changes under the clear condition that they are just a temporary solution and will be removed together with the DTO classes in a follow-up pull request.

I would be very interested to read what the other developers think about this approach. @henning-gerhardt , @thomaslow , @Erikmitk , @oliver-stoehr , @BartChris , what is your opinion on this?

@henning-gerhardt
Copy link
Collaborator

I agree with the @solth opinion regarding to introducing of the new interfaces. I see even no benefits or advantages to introduce them at this point as I did not see any different implementations of this classes in the future as they only used internal / inside the core application itself. If I missed the point than I need at least an explanation why they are really needed.

@thomaslow
Copy link
Collaborator

thomaslow commented Aug 5, 2024

I quickly skipped over the proposed changes. I agree that introducing interface classes that have a 1:1 relationship to their only subclasses is not really necessary. Usually, I like interfaces. In this case, however, the contract between the business logic and the database of Kitodo.Production is already achieved through the ORM model with hibernate beans. In my opinion, there are no additional interface classes necessary.

In the case of DTO classes, I partially agree with @henning-gerhardt. There definitely should be DTO classes. However, the existing DTO classes are not relevant anymore, since they were used between ElasticSearch and the business logic. I believe, this part (choosing the fields that are being indexed and converted to JSON) can now be controlled via Hibernate-Search. Based on the @Field(index=Index.YES/NO, store=Store.YES/NO) annotation, Hibernate-Search will most certainly generate a similar JSON document as before, which provides the same security as before, meaning sensitive information (like passwords) are not leaked to the search engine. Therefore, the existing DTO classes may be removed.

Instead, someday, in a galaxy far far away..., there should be DTO classes between the business logic and the user interface, such that it would not be possible anymore to trigger numerous ORM-based SQL queries from the user interface and potentially build a REST API that can be used to move Kitodo towards the current generation of web technologies. However, this has nothing to do with the migration to Hibernate-Search and should not be part of this pull request.

@Erikmitk
Copy link
Member

Erikmitk commented Aug 7, 2024

I don't have anything to add. I agree with the comments made by @henning-gerhardt, @thomaslow and @solth that additional interfaces make no sense in this context. :)

@matthias-ronge
Copy link
Collaborator Author

There are two groups of classes, database objects and index objects. (Here, I have the impression that we think the same thing, they are called DTOs but the name is not the concept behind DTO, so here it is just called that.)

The introduction of the interface is aimed at standardizing the "almost"—but not everywhere exactly—equal classes of objects from the database and objects from the index so that they become interchangeable, and is documentation of what the methods must return. This was also necessary in order to be able to achieve the alignment.

It is a basic tool of refactoring to first encapsulate the section to be newly implemented in an interface, then you can use it based on the interface and reimplement it. This is what happened here.

After the work has been done successfully, there is the option of keeping the interface or removing it. This is undecided here, I can be happy with both.

@solth
Copy link
Member

solth commented Aug 22, 2024

@matthias-ronge I understand what the intention of these new data interfaces is, but they are unecessary in the long run, as discussed above. When the DTO classes are removed, the interfaces are obsolete and should be removed. Since all reviewers seem to agree on this, I would say this is very much decided.

This means they can only be temporary addition. Therefore I think first opening a large pull request introducing a temporary construct and later opening more pull requests that remove this construct again is not very helpful and just multiplies the workload for you to maintain the individual branches and PRs and for the reviewers to perform uncessary reviews.

We did agree on splitting all changes of the hibernate search implementation into multiple pull requests to reduce the size of each individual PR, but the individual pull requests should all just introduce parts that are required for the final implementation and not contain large refactorings of the architecture that are later reverted.

Additionally, I don't think it makes sense to concentrate on fixing all tests for intermediate development states. Builds for pull requests against the hibernate-search branch do not have to pass all checks, in my opinon. Those checks are only necessary to pass once the final pull request against the master is openend.

@matthias-ronge
Copy link
Collaborator Author

In the early days of 3.x development, we placed a lot of emphasis on mapping all functionality using interfaces. When we were removing the legacy UGH library, I was even explicitly asked to represent the entire interface of the library as a graphic, which took me eight hours.

I observe that the strategy for dealing with interfaces is changing. I know that there are reasons for and against them, and it doesn't bother me if we discard them in the end. However, at this point in development, they were necessary because: The beans and the DTO objects were largely congruent, but they differed in some details.

  • the DTO objects returned strings or integers in some places, or wanted them as setters, where the beans return database objects
  • the DTO objects returned true or false (present or null/empty) in some places, where the beans contain database objects
  • the DTO objects returned information that was stored in the database in a different way.

With the introduction of the interfaces, I smoothed out these points and made the objects 100 percent interchangeable, which is necessary for the existing functionality to continue to work. This involved a lot of research to map the functionality properly. Because of the extensive changes, it was only possible for me to manage the brain load in this way; it's not just technical.

I don't think it makes sense to concentrate on fixing all tests for intermediate development states.

I took a pragmatic approach here and analyzed tests that failed. In some cases, these led me to points where the functionality behaved differently than expected or described. So they did what they were supposed to and I was able to correct errors. However, there are also tests that are no longer useful in the change if they tested functionality that was implemented but never used outside of the tests. I deleted these tests. (Less here, but also in the following pull requests.) Thirdly, I temporarily @Ignored a number of tests because they cannot work without an available index because they require index fields that are not currently available. However, these should work again at the end of development.

In principle, I don't think it's wrong if the tests that should work also do pass. If we should deviate from this, I would need to know because I assume that it is a condition for the review. However, I think that this could lead to a pile of unseen errors that then have to be resolved later. I think that this will give us a trustworthy result in the end, which is probably what everyone wants.

@matthias-ronge
Copy link
Collaborator Author

By recent agreement with Release Management, the provision of the code for the entire first part of the development has now been made into one joint pull request #6209. The individual parts pull requests will be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants