Skip to content

Conversation

@majiayu000
Copy link

Summary

  • Fix Python interpreter crash when calling ray.kill() on an actor created in a previous Ray session
  • After ray.shutdown() and ray.init(), killing an actor from the old session would cause a CHECK failure

Changes

  • Modify OnActorKilled() to use GetActorHandleIfExists() instead of GetActorHandle()
  • When actor handle doesn't exist, log a warning and return gracefully instead of crashing
  • Add C++ unit test for the fix

Test plan

  • Added TestOnActorKilledWithNonExistentHandle - verifies no crash when actor handle doesn't exist

Fixes #59340

@majiayu000 majiayu000 requested a review from a team as a code owner December 18, 2025 11:33
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a crash that occurs when ray.kill() is called on an actor from a previous session. The fix, which involves using GetActorHandleIfExists instead of GetActorHandle to gracefully handle non-existent actor handles, is correct and well-implemented. The addition of the TestOnActorKilledWithNonExistentHandle unit test ensures that this scenario is covered and prevents future regressions. The changes are clear and address the reported issue properly. I have one suggestion for improving const-correctness in a related function.


void ActorManager::OnActorKilled(const ActorID &actor_id) {
MarkActorKilledOrOutOfScope(GetActorHandle(actor_id));
auto actor_handle = GetActorHandleIfExists(actor_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of GetActorHandleIfExists is correct to fix the crash. On a related note, I noticed that GetActorHandleIfExists is not const, unlike its counterpart GetActorHandle. Since it only performs a read operation, making it const would improve const-correctness.

This is a minor suggestion for code quality that could be addressed here or in a follow-up. For context, the change would look like this:

// In actor_manager.h
std::shared_ptr<ActorHandle> GetActorHandleIfExists(const ActorID &actor_id) const;

// In actor_manager.cc
std::shared_ptr<ActorHandle> ActorManager::GetActorHandleIfExists(
    const ActorID &actor_id) const { /* ... */ }

@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Dec 18, 2025
When a user creates an actor, then calls ray.shutdown() and ray.init(),
and then tries to ray.kill() the actor from the old session, the
Python interpreter would crash with:

Check failed: it != actor_handles_.end() Cannot find an actor handle

This is because OnActorKilled() uses GetActorHandle() which asserts
that the actor handle exists. However, after a session restart, the
actor handle no longer exists in the new session's actor manager.

This fix modifies OnActorKilled() to use GetActorHandleIfExists()
instead, and gracefully handle the case where the actor handle doesn't
exist by logging a warning and returning early.

Fixes ray-project#59340

Signed-off-by: lif <[email protected]>
@majiayu000 majiayu000 force-pushed the fix/59340-ray-kill-crash branch from 3d1d465 to 39d4a4b Compare December 18, 2025 13:33
@codope
Copy link
Contributor

codope commented Dec 19, 2025

@majiayu000 There is already a PR before yours -- #59425 Thanks for the contribution but I will have to close this one in its favor. Please review the other PR if you get a chance.

@codope codope closed this Dec 19, 2025
@majiayu000 majiayu000 deleted the fix/59340-ray-kill-crash branch December 19, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Core] ray.kill an actor from another session will crash python interpreter

2 participants