Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As integration technology, I should use serialization format for exchange between analytics/applications over Kafka #248

Open
olivierlemee opened this issue Aug 22, 2024 · 0 comments
Assignees
Labels
line:mvf Action or item managed via the MVF production line dedicated to prototypes delivery priority:low Low priority for treatment type:feature Required behavior/compliance waited from a system or technology (performance, architecture, privacy)

Comments

@olivierlemee
Copy link
Collaborator

        - [ ] Evaluate and decide usage of Kafka serialization implementation technology regarding the facts and data exchanged on the DIS:
           - [ ] Apache Avro serializer/deserializer: https://avro.apache.org/ Avro schema file generations from java classes to IDL files (migration of POJO to IDL allowing to manage the schema generated as avro file for POJO mapping and Java classes auto-generation): https://www.instaclustr.com/blog/exploring-karapace-part-2/
              - AVRO ADVANTAGES
                 - dynamic typing: Unlike Protobuf, Avro does not require code generation, which enables more flexibility and easier integration with dynamic languages like Python or Ruby.
                 - Self-describing messages: Serialized data in Avro includes embedded schema information, making it possible to decode the data even if the reader does not have access to the original schema. As we have seen from the example, an Avro message must always be prefixed with some information about which schema was used to encode it or the decoder will either fail or create invalid data. Adding default values to the schema is very important to allow a value to be removed later.
                    - Protobuf: Despite the slightly smaller encoded data size for Avro, the ability to update Protobuf message definitions in a compatible way without having to prefix the encoded data with a schema identifier makes it a better choice for the data transmission of where object versions shall be automatically and dynamically managed by the deserializer. More easy to debug than protobuf which is less humand-readable wire format. BUT: Slower serialization/deserialization performance due to dynamic typing nature and embedded schema information
                 - Verbosity of schema definition in JSON
                 - JAVA
                    - IDL file (domain objects specification) > AVRO file (schema versioned, auto-generated java class)
                    - Avro file usable by Producers/Consumers for POJO mapping (e.g exchange of serialized data over Kafka or Redis or filesystem)
                    - Retrocompatibility test under Maven about schema and new generated classes: https://docs.confluent.io/platform/current/schema-registry/develop/maven-plugin.html#schema-registry-test-compatibility
                    - Producer example with automatic schema version added: https://github.com/confluentinc/examples/blob/7.4.1-post/clients/avro/src/main/java/io/confluent/examples/clients/basicavro/ProducerExample.java
                    - Consumer example with automatic schema version read: https://github.com/confluentinc/examples/blob/7.4.1-post/clients/avro/src/main/java/io/confluent/examples/clients/basicavro/ConsumerExample.java
                 - NodeJS
                    - JS encode/decode from [avro-js module](https://www.npmjs.com/package/avro-js) : https://blog.basyskom.com/2021/what-is-apache-avro-compared-to-protobuf
                 - Karapace (Kafka REST and schema registry into a docker instance)
                    - Karapace schema registry: https://www.instaclustr.com/blog/exploring-karapace-part-3/ supporting Avro, JSON schema and protobuf; with REST interface for schema management
                    - Github project : https://github.com/Aiven-Open/karapace
@olivierlemee olivierlemee self-assigned this Aug 22, 2024
@olivierlemee olivierlemee converted this from a draft issue Aug 22, 2024
@olivierlemee olivierlemee added priority:low Low priority for treatment line:mvf Action or item managed via the MVF production line dedicated to prototypes delivery labels Aug 22, 2024
@olivierlemee olivierlemee changed the title As domain applications integration layer, I should use serialization format for exchange between analytics/applications over Kafka As integration technology, I should use serialization format for exchange between analytics/applications over Kafka Aug 22, 2024
@olivierlemee olivierlemee added the type:feature Required behavior/compliance waited from a system or technology (performance, architecture, privacy) label Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
line:mvf Action or item managed via the MVF production line dedicated to prototypes delivery priority:low Low priority for treatment type:feature Required behavior/compliance waited from a system or technology (performance, architecture, privacy)
Projects
Development

No branches or pull requests

1 participant