Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event schema integration connector - add to connector catalog / release docs #690

Open
20 tasks done
davidradl opened this issue Mar 18, 2022 · 11 comments
Open
20 tasks done
Assignees

Comments

@davidradl
Copy link
Member

davidradl commented Mar 18, 2022

Name

egeria-connector-integration-event-schema

Owner

davidradl

Deliverable

Provides an integration connector that extract event schemata from a schema registry (including Confluent schema registry). The connector will be a polling connector and will look in Egeria for new topics that if present in the confluent registry, the associate schema elements will be brought into Egeria.

Build, test and CI-CD process

Answering the questions in order:

  • as usual
  • Gradle
  • Java
  • yes
  • same as core Egeria
  • same as core Egeria
  • I don't know what this means?

Dependencies

Core Egeria and whatever libraries are required to connect to the schema registry

Justification

This is no natural place for Event schema content to be placed in existing Repositories

Assumptions

Yes all true

Additional Information

Testing may require Karapace an open source schema registry with the same API as Confluent Schema registry

Work Plan

Before creating the repo

  • review overall request & get clarifications
  • get approval on developer/TSC call from maintainers

Creating the repo

  • Create the repo under the odpi organization (default gitignore, license, readme)
  • Setup branch protection rules
  • Set pull request options (allow merge, squash, rebase, suggest updating, allow automerge, do not delete head)
  • Update security settings in repo (policy, advisories, alerts)
  • set up permissions

First steps

  • Initial code-drop (author)

Getting CI/CD started & refining settings

  • Add initial build script for PR (including gradle wrapper if required)
  • Add initial build script for merge

Further Refinement

  • Add link in egeria docs to new repo describing purpose etc
  • Add required credentials for publishing to container repos, maven central etc
  • Add artifact signing if needed
  • Add dependabot config
  • Add CodeQL
  • Add to LFXSecurity
  • Add to LFAnalytics
  • Add check for stale defects
  • Add standard issue tags

Release

  • Add release pipeline
@planetf1
Copy link
Member

I agree with the proposal. It's simpler to have targeted repositories for each connector IMO.

Suggest we request formal approval at the TSC meeting 2022-03-23

@planetf1
Copy link
Member

New repo has been created from template https://github.com/odpi/egeria-template-newrepo - this initializes files, but not project properties - will continue tomorrow.

@planetf1 planetf1 transferred this issue from odpi/egeria Mar 24, 2022
@planetf1
Copy link
Member

No additional tags added -- they are inherited from top level odpi. We have too many on egeria, and need to review/consolidate before we push out to all new repos.

@planetf1
Copy link
Member

@davidradl A few final tasks to do - but perhaps you could take a look.
I've used a template this time with some dummy code - just enough to get a ci/cd job running & be able to configure the appropriate checks and settings

@planetf1
Copy link
Member

Note: please leave open. still trying to get the LFAnalytics sorted with the LF.

Two more tasks

  • Add to connector catalog
  • update template in egeria with above

@dwolfson
Copy link
Member

dwolfson commented May 6, 2022

Here are some notes I took as I went through some of the Kafka documentation - please feel free to add/modify/correct..

Confluent Schema Registry Thoughts

Date: 2022-05-06 09:28

Schema Registry Purpose

The purpose of the schema registry is to share message schemas between producers and consumers so that they can be mutually understood as well as to evolve in some careful ways. Schema Registry is community license - but some features require Enterprise license. Schema registry is available within the Confluent Control Center tooling (also community licensed) and via Restful calls.

Design concepts - Schema Registry Schema Linking for Confluent Platform Developers | Confluent Documentation

  • Producers and consumers can register their schemas automatically
  • Each registered schema has a globally unique ID (within a cluster)
  • Kafka is backend of schema registry (within a cluster)
  • A topic contains messages and each message is a key-value pair
    • either the key, value or both can be serialized as Avro, JSON or Protobuf
    • Kafka topic name can be independent of the schema name
    • Subject is a scope with which a schema can evolve
    • There are three different subject name strategies
    • Default subject name strategy is derives subject name from topic name
  • A registry can contain multiple contexts that act as namespaces to support activities such as lifecycle management between dev, test, staging, prod for example.
  • Schema linking now available - this supports different scopes within a registry and helps sync schemas to other clusters in the deployment Schema Linking for Confluent Platform Developers | Confluent Documentation - this creates exporters..
  • Confluent replicator can migrate schemas from one schema registry to another and automatically rename subjects on the target registry

Subject naming strategy - Formats, Serializers, and Deserializers | Confluent Documentation

  • Topic Name Strategy - default - subject derived purely from topic - all messages in topic must have same schema
  • Record Name Strategy - multiple schemas in a topic. Topic can have multiple subjects.

Issues being addressed by Schema Registry

  • Track (and to a limited degree manage) schemas used within a Kafka cluster
  • Pass schema ID within the message rather than passing the entire schema (at least for Avro and Protobuf) - smaller messages
  • Some amount of schema evolution
  • Support for contexts (namespaces) within the registry to support lifecycle
  • Some schema runtime validation (within constraints)

Limitations of Schema Registry

  • Schema IDs are only unique within a single cluster
  • Many of the features of the tooling only support the default subject naming strategy
  • Seems to be limited value if message schemas are JSON

Egeria integration considerations

  • Topology matters - schema IDs are unique only within a cluster; replication and more sophisticated use of contexts requires Enterprise features such as replication and Linking.
  • Lifecycle approach matters - contexts, deployments..
  • Schema migration is limited
  • Subject naming strategy really is driven by the message patterns you want to implement
  • There does not seem to be an architected approach for Topic naming - perhaps Egeria can help here
  • What else?

@davidradl
Copy link
Member Author

Thanks @dwolfson . I thought I would add here the idea we are considering that the Strimzi connector could pick up the subject naming strategy from the CRD, Egeria could store it in the metadata associated with the Kafka Topic, such that when the Event schema integration connector runs, it gets all the relevant topics from Egera and then can use the subject naming strategy information from Egeria to look for the subjects that correspond to the topic. Exactly what would be stored in Egeria to represent the subject name strategy has not been agreed yet; maybe the subject naming strategy name is enough.

@github-actions
Copy link

github-actions bot commented Feb 1, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

@planetf1
Copy link
Member

planetf1 commented Feb 7, 2023

only minor docs remaining

@planetf1 planetf1 transferred this issue from odpi/egeria-connector-integration-event-schema Feb 7, 2023
@planetf1
Copy link
Member

planetf1 commented Mar 28, 2023

Still needs adding to connector catalog / release docs

@planetf1 planetf1 removed their assignment Mar 28, 2023
@planetf1 planetf1 changed the title [REPOSITORY] Egeria Event schema integration connector Event schema integration connector - add to connector catalog / release docs Mar 28, 2023
@planetf1
Copy link
Member

@juergenhemelt Can you handle this? This is just some missing docs -- documenting the connector in the connector catalog & documenting the release process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants