Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DG-1925: added code to create kafka topic 'TAG_PROP_EVENTS' and push events to… #3791

Open
wants to merge 8 commits into
base: beta
Choose a base branch
from

Conversation

abhijeet-atlan
Copy link

@abhijeet-atlan abhijeet-atlan commented Nov 27, 2024

Change description

Description here
DG1925
Design plan ref :Kafka event emitted by Planner

This code change is only for add propogation and for other operations will be similar, once this is reviewed I will make similar changes to all and push.

…initialization in try-catch blocks

- Added exception handling for `ApplicationProperties.get()` and `KafkaNotification` initialization.
- Ensures `AtlasException` is caught and rethrown as `RuntimeException` for better runtime error propagation.
- Moved initialization of  and  () into the constructor.
- Added  block to handle  during initialization.
- Ensures any exceptions are wrapped and propagated as  for consistency.
…pper' constructor with exception handling

- Moved initialization of 'configuration' and 'KafkaNotification' ('kfknotif') into the constructor.
- Added 'try-catch' block to handle 'AtlasException' during initialization.
- Ensures any exceptions are wrapped and propagated as 'RuntimeException' for consistency.
…n Apache Atlas

This commit introduces comprehensive enhancements to Apache Atlas's Kafka integration, focusing on dynamic topic management, partition-specific messaging, and configuration improvements to support scalability and efficient data distribution across Kafka topics.

Detailed Changes:

1. **Dynamic Topic Creation**:
   - Introduced a new method `createTopics` in the `KafkaUtils` class. This method facilitates the creation of Kafka topics with configurable numbers of partitions and replication factors, derived from a list of topic details. This allows for more granular control over topic configurations directly from the application layer.
   - Added debug logging to provide detailed traceability for topic creation operations, enhancing monitoring and troubleshooting capabilities.

2. **Configuration Management Enhancements**:
   - Modified `atlas_config.py` to automatically include the new `TAG_PROP_EVENTS` topic in the list of topics initialized at startup, ensuring that this topic is available for event propagation without manual configuration.
   - Introduced a new configuration key `NOTIFICATION_PROPAGATION_TOPIC_NAME` in `AtlasConfiguration.java`, standardizing the topic name across the codebase and reducing the risk of hard-coded string errors.

3. **Partition-Specific Messaging Capabilities**:
   - Extended the `KafkaNotification` and `AbstractNotification` classes to include methods that support sending messages to specific partitions. This feature is critical for directing messages to particular segments of a topic, thereby optimizing the workload distribution and message consumption based on topic partitioning.
   - Implemented partition calculation in `EntityGraphMapper` using Guava's consistent hashing algorithm. This calculation uses the SHA-256 hash of a parent task's GUID to determine the partition, ensuring that related messages are co-located in the same partition for improved processing efficiency.

4. **Dependency Updates**:
   - Added the Google Guava library to the project dependencies to utilize its robust hashing functions, which are essential for implementing consistent hashing for partition determination.

5. **Error Handling and Logging Improvements**:
   - Enhanced error handling in the topic creation process to re-throw exceptions, allowing calling methods to handle these exceptions according to the application's error management policies.
   - Improved logging statements to include detailed error messages and contextual information, aiding in faster diagnosis and resolution of issues related to Kafka messaging.

These enhancements aim to bolster Apache Atlas's capabilities in handling high-throughput Kafka topics like `TAG_PROP_EVENTS` by introducing advanced topic management features and partition-specific messaging. This aligns with the needs of large-scale, distributed environments where efficient data handling and scalability are paramount.
…ration enhancements. Details are below:

	1.	Introduced a constant TAG_PROP_EVENTS_PARTITION_COUNT for Kafka partition value management.
	2.	Updated AtlasTopicCreator to use configurable partition counts for topics dynamically.
	3.	Refactored EntityGraphMapper to use the new constant for partition count initialization.
@jnkrmg
Copy link

jnkrmg commented Dec 2, 2024

3 pending things for this PR:

  1. Separate the work done related to sending messages to Kafka and raise a separate PR
  2. Update the assetsCountPropagated field in the Task inside the background execution thread
  3. Add this change to actions other than ADD_PROPAGATION

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants