Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Race Condition in removeReadyTimers() to Prevent Cleared Timers from Firing #143

Merged

Conversation

FuRyanf
Copy link

@FuRyanf FuRyanf commented Feb 18, 2025

Issue Summary

A race condition in removeReadyTimers() allows timers that were cleared in the same batch to still fire. This occurs because the method loads timers into eventTimeBuffer in batches, and if an earlier timer clears a later timer, the later timer is still added to readyTimers. However, since state.get(timerKey, timeDomain) returns null (or 0) after deletion, it fails to be properly removed, leading to it firing incorrectly when it should have been cleared.

Root Cause

  1. removeReadyTimers() removes timers from eventTimeBuffer and adds them to readyTimers.
  2. Timers are deleted from persistent state using state.deletePersisted(keyedTimerData).
  3. If an earlier timer in the same batch clears a later timer, the later timer is already removed from state, making state.get(timerKey, timeDomain) return null.
  4. This causes the later timer to still exist in readyTimers, leading to it firing when it should have been cleared.

Proposed Solution

To address this race condition, we have consolidated the removal and firing logic into a single method, fireReadyTimers(), ensuring that timers are verified in persistent state before being fired. The updated approach works as follows:

  • Deferred Deletion Until Firing: Timers are first collected from the eventTimeBuffer without immediately deleting them from persistent state.
  • State Verification Before Firing: Each timer is checked against state.get(timerKey, timeDomain). If it still exists in persistent state and its stored timestamp matches, it is deleted and fired.
  • Consolidation of Logic: The previous removeReadyTimers() function was eliminated, and all related calls (e.g., in DoFnOp, GroupByKeyOp, and SplittableParDoProcessKeyedElementsOp) were updated to use fireReadyTimers().
  • Batch Processing Remains Consistent: If a timer was adjusted (i.e., rescheduled), the stored timestamp will no longer match, and the timer will not be fired immediately. Instead, it will be processed in the next batch when its new timestamp becomes due.
  • Automatic Timer Buffer Reloading: When the in-memory timer buffer is emptied, timers are reloaded from persistent state to ensure that no due timers are missed.

Code Changes:

  • Introduced fireReadyTimers() in SamzaTimerInternalsFactory, replacing removeReadyTimers().
  • Updated DoFnOp, GroupByKeyOp, and SplittableParDoProcessKeyedElementsOp to invoke fireReadyTimers() instead of handling timers separately.
  • Added a unit test in SamzaTimerInternalsFactoryTest to verify that fireReadyTimers() correctly processes and fires timers while ensuring that cleared timers are not fired.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@xinyuiscool
Copy link

Please do a ./gradlew :runners:samza:validatesRunner so we can run through all beam timer test cases. Thanks.

@github-actions github-actions bot added the samza label Feb 26, 2025
@FuRyanf FuRyanf force-pushed the FuRyanf/cleared-timers-firing-bug branch from 8432257 to 0d194c0 Compare February 26, 2025 02:25
@github-actions github-actions bot added the build label Feb 26, 2025
@FuRyanf FuRyanf requested a review from xinyuiscool February 26, 2025 15:26
@FuRyanf FuRyanf force-pushed the FuRyanf/cleared-timers-firing-bug branch from 0d194c0 to e364c9c Compare February 26, 2025 16:16
@FuRyanf FuRyanf force-pushed the FuRyanf/cleared-timers-firing-bug branch 2 times, most recently from d5649cf to 8407aa7 Compare February 26, 2025 18:27
@FuRyanf FuRyanf force-pushed the FuRyanf/cleared-timers-firing-bug branch from 8407aa7 to d858e1f Compare February 26, 2025 18:36
Copy link

@xinyuiscool xinyuiscool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the update. One minor comment.

@FuRyanf FuRyanf force-pushed the FuRyanf/cleared-timers-firing-bug branch from 5b0b671 to 0717427 Compare March 1, 2025 01:55
@xinyuiscool xinyuiscool merged commit bb44789 into linkedin:li_trunk Mar 1, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants