Skip to content

fix(slots): prevent dead worker re-entering pool on double error listener#29518

Open
10done wants to merge 10 commits into
calcom:mainfrom
10done:fix/lifecycleerror
Open

fix(slots): prevent dead worker re-entering pool on double error listener#29518
10done wants to merge 10 commits into
calcom:mainfrom
10done:fix/lifecycleerror

Conversation

@10done

@10done 10done commented Jun 7, 2026

Copy link
Copy Markdown

What does this PR do?

When a worker errored mid-task, both the persistent lifecycle on("error")
and the task-specific once("error") fired. The task listener was pushing
the terminated worker back into availableWorkers after handleWorkerFailure
had already cleaned it up, inflating the pool with a dead worker reference.

Fix: errorListener now removes its sibling messageListener and does not
push the worker back — pool cleanup is owned solely by handleWorkerFailure.
Add unit tests covering all five corruption scenarios.

Visual Demo (For contributors especially)

Image Demo (if applicable):

Before the fix
image
Screenshot 2026-06-08 at 3 09 31 AM

After the fix
Screenshot 2026-06-08 at 3 09 47 AM

Mandatory Tasks (DO NOT REMOVE)

  • I have self-reviewed the code (A decent size PR without self-review might be rejected).
  • I have updated the developer docs if this PR makes changes that would require a documentation change. If N/A, write N/A here and check the checkbox.
  • I confirm automated tests are in place that prove my fix is effective or that my feature works.

How should this be tested?

cd apps/api/v2
corepack yarn jest slots-worker.service.bug --no-coverage --verbose

No additional environment variables required. The unit tests mock all worker thread infrastructure — no real worker processes are spawned.

The tests are fully self-contained unit tests with no database, no API keys, and no external dependencies.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Welcome to Cal.diy, @10done! Thanks for opening this pull request.

A few things to keep in mind:

  • This is Cal.diy, not Cal.com. Cal.diy is a community-driven, fully open-source fork of Cal.com licensed under MIT. Your changes here will be part of Cal.diy — they will not be deployed to the Cal.com production app.
  • Please review our Contributing Guidelines if you haven't already.
  • Make sure your PR title follows the Conventional Commits format.

A maintainer will review your PR soon. Thanks for contributing!

@github-actions github-actions Bot added the 🐛 bug Something isn't working label Jun 7, 2026
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The pull request refactors the task-specific worker event listeners in SlotsWorkerService_2024_04_15 to prevent double-firing with persistent lifecycle handlers. The messageListener now removes its sibling errorListener before returning the worker to the pool and processing the next task. The errorListener removes the sibling messageListener and rejects the task without returning the worker, delegating pool cleanup to persistent lifecycle handlers. The postMessage error path also removes both listeners and calls processNextTask() for continuation. A comprehensive Jest test suite validates listener cleanup, pool synchronization, and healthy worker replacement after crashes.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main fix: preventing a dead worker from re-entering the pool due to a double error listener conflict.
Description check ✅ Passed The description is directly related to the changeset, explaining the bug (double error handler firing), the fix (errorListener removing sibling messageListener and not pushing worker back), and providing test instructions.
Linked Issues check ✅ Passed The PR successfully addresses all objectives from issue #29454: prevents double-handling of error events [#29454], ensures failed workers are removed cleanly without returning to availableWorkers [#29454], consolidates cleanup responsibility to handleWorkerFailure [#29454], and adds reproducible unit tests [#29454].
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the double error listener conflict: the service implementation targets error handler coordination and listener cleanup, and test additions focus exclusively on the five corruption scenarios referenced in issue #29454.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
apps/api/v2/src/modules/slots/slots-2024-04-15/services/slots-worker.service.bug.spec.ts (1)

6-9: ⚡ Quick win

Trim the comments down to intent.

Most of these banners narrate the mock/test mechanics instead of the failure mode the spec is pinning down. Keeping only the comments that explain why a case exists will make the file much easier to scan.
As per coding guidelines, "Only add code comments that explain why, not what" and "Never add comments that simply restate what the code does."

Also applies to: 23-26, 65-67, 95-98, 121-125

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@apps/api/v2/src/modules/slots/slots-2024-04-15/services/slots-worker.service.bug.spec.ts`
around lines 6 - 9, Replace the verbose banner comment "Shape of the mocked
Worker instances created inside the jest.mock factory." with a single one-line
intent comment that explains why the mock shape is declared (e.g., "Define the
mocked Worker interface used by the jest.mock factory so tests share a single
type"), and likewise shorten the other large banner-style comments in this spec
to one-line intent comments that state why each test or helper exists (not how
it works) — apply this change to the other multi-line banner blocks near the
mocked Worker and test-case sections so comments explain intent only.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@apps/api/v2/src/modules/slots/slots-2024-04-15/services/slots-worker.service.ts`:
- Around line 205-206: The code registers messageListener and errorListener via
worker.once(...) before calling worker.postMessage(...) and fails to remove them
if postMessage throws or the subsequent catch path requeues the worker; update
the catch blocks around worker.postMessage(...) to call worker.off("message",
messageListener) and worker.off("error", errorListener) (or
worker.removeListener) before pushing the worker back into availableWorkers or
reusing it so stale handlers don't remain; apply the same removal in the other
catch block that also requeues the worker.
- Around line 181-183: handleWorkerFailure currently mutates this.workerPool
with splice(this.workerPool.indexOf(failedWorker), 1) and can run twice for the
same worker, and processNextTask attaches once("message")/once("error") before
calling worker.postMessage(...) so a synchronous postMessage throw leaves
dangling listeners; fix by making failure handling idempotent and cleaning up
listeners: in handleWorkerFailure (and the worker.on("error")/worker.on("exit")
callers) first check that the worker is still in this.workerPool (index !== -1)
or set and check a per-worker flag (e.g., worker.__failed) before splicing, and
when processNextTask sets up once("message")/once("error"), wrap the postMessage
call so that on synchronous exception you remove those once listeners and return
the worker to availableWorkers (or mark it failed) to avoid handlers firing
later.

---

Nitpick comments:
In
`@apps/api/v2/src/modules/slots/slots-2024-04-15/services/slots-worker.service.bug.spec.ts`:
- Around line 6-9: Replace the verbose banner comment "Shape of the mocked
Worker instances created inside the jest.mock factory." with a single one-line
intent comment that explains why the mock shape is declared (e.g., "Define the
mocked Worker interface used by the jest.mock factory so tests share a single
type"), and likewise shorten the other large banner-style comments in this spec
to one-line intent comments that state why each test or helper exists (not how
it works) — apply this change to the other multi-line banner blocks near the
mocked Worker and test-case sections so comments explain intent only.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7f943eaf-91d7-48be-896a-82fbffd8b03c

📥 Commits

Reviewing files that changed from the base of the PR and between d1ad4ea and f33220c.

📒 Files selected for processing (2)
  • apps/api/v2/src/modules/slots/slots-2024-04-15/services/slots-worker.service.bug.spec.ts
  • apps/api/v2/src/modules/slots/slots-2024-04-15/services/slots-worker.service.ts

Comment thread apps/api/v2/src/modules/slots/slots-2024-04-15/services/slots-worker.service.ts Outdated
Comment thread apps/api/v2/src/modules/slots/slots-2024-04-15/services/slots-worker.service.ts Outdated
@kartik-212004

kartik-212004 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

slots-worker.service.bug.spec.ts is not a standard way to name a file

@10done

10done commented Jun 8, 2026

Copy link
Copy Markdown
Author

slots-worker.service.bug.spec.ts is not a standard way to name a file

Apologies. Will make the changes.

@10done

10done commented Jun 8, 2026

Copy link
Copy Markdown
Author

slots-worker.service.bug.spec.ts is not a standard way to name a file

@kartik-212004 I have made the changes.

@kartik-212004

Copy link
Copy Markdown
Contributor

slots-worker.service.bug.spec.ts is not a standard way to name a file

@kartik-212004 I have made the changes.

please check the above comments aswell

@10done

10done commented Jun 8, 2026

Copy link
Copy Markdown
Author

slots-worker.service.bug.spec.ts is not a standard way to name a file

@kartik-212004 I have made the changes.

please check the above comments aswell

Which other comments exactly. Can you please point to them.

@kartik-212004

Copy link
Copy Markdown
Contributor

slots-worker.service.bug.spec.ts is not a standard way to name a file

@kartik-212004 I have made the changes.

please check the above comments aswell

Which other comments exactly. Can you please point to them.

https://github.com/calcom/cal.diy/pull/29518/changes#r3370822336
https://github.com/calcom/cal.diy/pull/29518/changes#r3370857024

@10done

10done commented Jun 8, 2026

Copy link
Copy Markdown
Author

slots-worker.service.bug.spec.ts is not a standard way to name a file

@kartik-212004 I have made the changes.

please check the above comments aswell

Which other comments exactly. Can you please point to them.

https://github.com/calcom/cal.diy/pull/29518/changes#r3370822336 https://github.com/calcom/cal.diy/pull/29518/changes#r3370857024

I can't see the inline comments in Files changed (sidebar shows 0 of 2 — possibly outdated).
image

@kartik-212004

Copy link
Copy Markdown
Contributor

that's weird, it should have been visible.
image

@10done

10done commented Jun 8, 2026

Copy link
Copy Markdown
Author

@kartik-212004 I have addressed the comments.

P.S. : Those comments were in the pending state, need to be submitted.
Thank you for the review.

@10done

10done commented Jun 9, 2026

Copy link
Copy Markdown
Author

@kartik-212004 Can you please review whenever you have time. Thank You.

@10done

10done commented Jun 12, 2026

Copy link
Copy Markdown
Author

@bandhan-majumder Can I have a review please. Thank You.

@CLAassistant

CLAassistant commented Jun 14, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐛 bug Something isn't working size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MEDIUM: Task-specific error listener conflicts with lifecycle error listener, corrupting worker pool state

3 participants