Closed
Conversation
…ere in a specific position (apache#14389)
… order (apache#14391) * order index types so servers can process them in a more deterministic order * refine fwd_index_handler logs a bit for easy debug
…rser.DEFAULT_IDENTIFIER_MAX_LENGTH) to 1024. (apache#14363)
…4340) * Added support to perform task validations for plug-in tasks * trigger build2
Fix NPE and execution when --help was specified. Remove code that was supposed to generate help message.
* [timeseries] Fix Server Selection Bug + Enforce Timeout * pass requestId and brokerId to servers * undo the 2 server quickstart change
…maConformingTransformer (apache#14351) * Remove emitting null value fields during data transformation. * Fix lint issues. * Revise based on comments
Bumps software.amazon.awssdk:bom from 2.29.40 to 2.29.41. --- updated-dependencies: - dependency-name: software.amazon.awssdk:bom dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…afety issue in the JDBC client (apache#14723) Use DateTimeFormatter instead of SimpleDateFormat to resolve thread safety issue in the JDBC client
…14728) Bumps [com.puppycrawl.tools:checkstyle](https://github.com/checkstyle/checkstyle) from 10.21.0 to 10.21.1. - [Release notes](https://github.com/checkstyle/checkstyle/releases) - [Commits](checkstyle/checkstyle@checkstyle-10.21.0...checkstyle-10.21.1) --- updated-dependencies: - dependency-name: com.puppycrawl.tools:checkstyle dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps software.amazon.awssdk:bom from 2.29.41 to 2.29.43. --- updated-dependencies: - dependency-name: software.amazon.awssdk:bom dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1. Changing FSM 2. Changing the 3 steps performed during the commit protocol to update ZK and Ideal state
1. Changes in the commit protocol to start segment commit before the build 2. Changes in the BaseTableDataManager to ensure that the locally built segment is replaced by a downloaded one only when the CRC is present in the ZK Metadata 3. Changes in the download segment method to allow waited download in case of pauseless consumption
…segment commit end metadata call Refactoing code for redability
… ingestion by moving it out of streamConfigMap
…auseless ingestion in RealtimeSegmentValidationManager
…d by RealtimeSegmentValitdationManager to fix commit protocol failures
…g commit protocol
9aman
pushed a commit
that referenced
this pull request
Feb 19, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pauseless Ingestion Failure Resolution
PR for happy path: apache#14741
Please refer to the above PR for more details before reviewing this PR that covers resolution for failure scenarios.
Summary
This PR aims to provide ways to resolve the failure scenarios that we can encounter during pauseless ingestion. The detailed list of failure scenarios can be found here: link along with the failure handling strategies: link
Following sequence diagrams summarizes the failure scenarios and the resolution.


Failure Scenarios & Resolution Approaches
Failures encountered during the commit protocol can be categorized into two types: recoverable and unrecoverable failures.
Recoverable failures are those in which at least one of the servers retains the segment on disk.
Unrecoverable failures occur when none of the servers have the segment on disk.
Recoverable Failures
Recoverable failures will be addressed through RealtimeSegmentValidationManager. This approach will handle scenarios such as upload failures and incomplete commit protocol executions.
The controller or server can run into issues in between any of the steps of the commit protocol as listed below:
Request Type: COMMIT_START
Request Type: COMMIT_END_METADATA
4. Update Segment ZK metadata for the committing segment (seg__0__0):
- Change status to DONE.
- Update deepstore url.
- Any additional metadata.
The RealtimeSegmentValidationManager figures out which step of the commit protocol failed and how can it be fixed. This is very similar to how commit protocol failures were handled before with some minor changes.
Non-recoverable Failures
These failures require ingesting the segment again from upstream, followed by build, upload and ZK metadata update.