Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server replica allowed to overwrite deep store segment after segment was already committed #14786

Open
dang-stripe opened this issue Jan 9, 2025 · 1 comment

Comments

@dang-stripe
Copy link
Contributor

dang-stripe commented Jan 9, 2025

For a 3 replica table, we hit a case where:

2 replicas started segment commit - 1 server was successful and uploaded to deep store. Segment moves to ONLINE state.

[2025-01-02 06:04:22.684927] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-29:270] HOLDING:Picking winner time=6011 size=2
[2025-01-02 06:04:22.684947] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-29:270] HOLDING:Committer notified winner instance=Server_62 offset=353103174

3rd replica server (Server_95) was recently replaced w/ a new server and starting up. It started segment commit after the previous one was done. The controller allowed it to proceed w/ deep store upload, overwriting the previous segment copy.

Controller then failed when trying to update the ideal state since the segment wasn't in the CONSUMING state. This repeated several times until it gave up and then downloaded the copy in deep store.

[2025-01-02 06:15:33.342940] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-79:320] Processing segmentCommitStart(Server_95, 353106085)
[2025-01-02 06:15:33.343018] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-79:320] Processing segmentCommit(Server_95, 353106085)
[2025-01-02 06:15:33.343038] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-79:320] HOLDING:HOLD for instance=Server_95 offset=353106085
[2025-01-02 06:15:36.362908] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-127:368] Processing segmentConsumed(Server_95, 353106085)
[2025-01-02 06:15:36.362927] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-127:368] HOLDING:Picking winner time=103851 size=1
[2025-01-02 06:15:36.362946] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-127:368] HOLDING:Committer notified winner instance=Server_95 offset=353106085
[2025-01-02 06:15:36.362962] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-127:368] HOLDING:COMMIT for instance=Server_95 offset=353106085 buldTimeSec=126
[2025-01-02 06:15:36.366488] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-48:289] Processing segmentCommitStart(Server_95, 353106085)
[2025-01-02 06:15:36.366507] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-48:289] COMMITTER_NOTIFIED:Uploading for instance=Server_95 offset=353106085
[2025-01-02 06:15:52.366909] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-59:300] Processing segmentCommitEnd(Server_95, 353106085)
[2025-01-02 06:15:52.366952] INFO [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-59:300] Committing segment table1__327753__3444__20250102T0204Z at offset 353106085 winner Server_95
[2025-01-02 06:16:19.121540] ERROR [SegmentCompletionFSM_table1__327753__3444__20250102T0204Z] [grizzly-http-server-59:300] Caught exception while committing segment metadata for segment: table1__327753__3444__20250102T0204Z
[2025-01-02 06:16:19.121585] java.lang.IllegalStateException: Failed to find instance in CONSUMING state in IdealState for segment: table1__327753__3444__20250102T0204Z
[2025-01-02 06:16:19.121602] 	at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:604) ~[pinot-all-1.2.0-2024-11-20-d3d547f256-SNAPSHOT-jar-with-dependencies.jar:1.2.0-2024-11-20-d3d547f256-SNAPSHOT-d3d547f2564eae4922c1ecb0280d6960f74d6902]
[2025-01-02 06:16:19.121626] 	at org.apache.pinot.controller.helix.core.realtime.PinotLLCRealtimeSegmentManager.commitSegmentMetadataInternal(PinotLLCRealtimeSegmentManager.java:523) ~[pinot-all-1.2.0-2024-11-20-d3d547f256-SNAPSHOT-jar-with-dependencies.jar:1.2.0-2024-11-20-d3d547f256-SNAPSHOT-d3d547f2564eae4922c1ecb0280d6960f74d6902]
...
commit + upload + fail repeats multiple times
...
[2025-01-02 06:26:03.847443] WARN [RealtimeSegmentDataManager_table1__327753__3444__20250102T0204Z] [HelixTaskExecutor-message_handle_thread_32:28] Current offset 353106085 ahead of the offset in zk 353103174. Downloading to replace
[2025-01-02 06:26:06.923869] INFO [table1_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_32:249] Downloaded and replaced CONSUMING segment: table1__327753__3444__20250102T0204Z

What's concerning here is that the offsets are different (353103174 vs 353106085). It seems like the controller should not have allowed the 3rd replica to proceed w/ the upload and commit.

We're using the realtime.segment.serverUploadToDeepStore feature to allow servers to upload segments directly are part of segment commit, but it doesn't seem like it's specific to that feature.

This separately caused a CRC mismatch failure for upsert compaction since Minion compacted the first version of the segment and updated the metadata, but the 3rd replica overwrote the S3 copy.

[2025-01-09 00:39:58.795983] ERROR [TaskFactoryRegistry] [TaskStateModelFactory-task_thread-33:17] Caught exception while executing task: Task_UpsertCompactionTask_0fbf766c-942e-4654-a938-7362bf6433de_1736382915514_1
[2025-01-09 00:39:58.796017] java.lang.IllegalStateException: Crc mismatched between ZK and deepstore copy of segment: table1__327753__3444__20250102T0204Z. Expected crc from ZK: 2394021143, crc from deepstore: 1876071145
[2025-01-09 00:39:58.796047] 	at org.apache.pinot.plugin.minion.tasks.upsertcompaction.UpsertCompactionTaskExecutor.convert(UpsertCompactionTaskExecutor.java:70) 
@dang-stripe
Copy link
Contributor Author

cc @Jackie-Jiang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant