As integration technology, I should use serialization format for exchange between analytics/applications over Kafka #248

olivierlemee · 2024-08-22T13:56:24Z

        - [ ] Evaluate and decide usage of Kafka serialization implementation technology regarding the facts and data exchanged on the DIS:
           - [ ] Apache Avro serializer/deserializer: https://avro.apache.org/ Avro schema file generations from java classes to IDL files (migration of POJO to IDL allowing to manage the schema generated as avro file for POJO mapping and Java classes auto-generation): https://www.instaclustr.com/blog/exploring-karapace-part-2/
              - AVRO ADVANTAGES
                 - dynamic typing: Unlike Protobuf, Avro does not require code generation, which enables more flexibility and easier integration with dynamic languages like Python or Ruby.
                 - Self-describing messages: Serialized data in Avro includes embedded schema information, making it possible to decode the data even if the reader does not have access to the original schema. As we have seen from the example, an Avro message must always be prefixed with some information about which schema was used to encode it or the decoder will either fail or create invalid data. Adding default values to the schema is very important to allow a value to be removed later.
                    - Protobuf: Despite the slightly smaller encoded data size for Avro, the ability to update Protobuf message definitions in a compatible way without having to prefix the encoded data with a schema identifier makes it a better choice for the data transmission of where object versions shall be automatically and dynamically managed by the deserializer. More easy to debug than protobuf which is less humand-readable wire format. BUT: Slower serialization/deserialization performance due to dynamic typing nature and embedded schema information
                 - Verbosity of schema definition in JSON
                 - JAVA
                    - IDL file (domain objects specification) > AVRO file (schema versioned, auto-generated java class)
                    - Avro file usable by Producers/Consumers for POJO mapping (e.g exchange of serialized data over Kafka or Redis or filesystem)
                    - Retrocompatibility test under Maven about schema and new generated classes: https://docs.confluent.io/platform/current/schema-registry/develop/maven-plugin.html#schema-registry-test-compatibility
                    - Producer example with automatic schema version added: https://github.com/confluentinc/examples/blob/7.4.1-post/clients/avro/src/main/java/io/confluent/examples/clients/basicavro/ProducerExample.java
                    - Consumer example with automatic schema version read: https://github.com/confluentinc/examples/blob/7.4.1-post/clients/avro/src/main/java/io/confluent/examples/clients/basicavro/ConsumerExample.java
                 - NodeJS
                    - JS encode/decode from [avro-js module](https://www.npmjs.com/package/avro-js) : https://blog.basyskom.com/2021/what-is-apache-avro-compared-to-protobuf
                 - Karapace (Kafka REST and schema registry into a docker instance)
                    - Karapace schema registry: https://www.instaclustr.com/blog/exploring-karapace-part-3/ supporting Avro, JSON schema and protobuf; with REST interface for schema management
                    - Github project : https://github.com/Aiven-Open/karapace

The text was updated successfully, but these errors were encountered:

olivierlemee added this to Open Source Foundation Pipeline Aug 22, 2024

olivierlemee self-assigned this Aug 22, 2024

olivierlemee converted this from a draft issue Aug 22, 2024

olivierlemee added this to the OS MVF V0 - Framework Stream milestone Aug 22, 2024

olivierlemee added priority:low Low priority for treatment line:mvf Action or item managed via the MVF production line dedicated to prototypes delivery labels Aug 22, 2024

olivierlemee changed the title ~~As domain applications integration layer, I should use serialization format for exchange between analytics/applications over Kafka~~ As integration technology, I should use serialization format for exchange between analytics/applications over Kafka Aug 22, 2024

olivierlemee added the type:feature Required behavior/compliance waited from a system or technology (performance, architecture, privacy) label Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

As integration technology, I should use serialization format for exchange between analytics/applications over Kafka #248

As integration technology, I should use serialization format for exchange between analytics/applications over Kafka #248

olivierlemee commented Aug 22, 2024

As integration technology, I should use serialization format for exchange between analytics/applications over Kafka #248

As integration technology, I should use serialization format for exchange between analytics/applications over Kafka #248

Comments

olivierlemee commented Aug 22, 2024