Skip to content

[data] Failed to submit task to actor after ray.shutdown() and re-ray.init() in data pipeline for an existing cluster #54841

@praateekmahajan

Description

@praateekmahajan

What happened + What you expected to happen

If we have an existing ray cluster (e.g. setup using cli ray start --num-cpus 64) and we run two Ray Data pipelines back to back - each pipeline using ray.init(), processing a dataset with map_batches, and followed by ray.shutdown() - the second pipeline fails with an error indicating Ray cannot submit a task to an actor because it "can't find actor ... it might be dead or it's from a different cluster."

The use case is that we want a cluster to be instantiated independent of our pipelines, and each of our pipelines might be executed one after the other (with possibly different environment variables, so you can imagine the second pipeline sepcifies runtime_env={"env_vars": {"XYZ": "VALUE"}} while the first doesn't)

Versions / Dependencies

('2.48.0', '03491225d59a1ffde99c3628969ccf456be13efd')

Reproduction script

import os

import ray

# start a ray cluster using `ray start`
os.environ["RAY_ADDRESS"] = "127.0.0.1:8265"


class F:
    def __call__(self, x):
        return x

# First pipeline
ray.init(ignore_reinit_error=True)
ray.data.range(1000).map_batches(F, concurrency=(1, 4), num_cpus=1,).take_all()
ray.shutdown()

# Second pipeline
ray.init(ignore_reinit_error=True)
ray.data.range(1000).map_batches(F, concurrency=(1, 4), num_cpus=1,).take_all()
ray.shutdown()

Error

2025-07-22 21:34:22,645	ERROR exceptions.py:73 -- Exception occurred in Ray Data or Ray Core internal code. If you continue to see this error, please open an issue on the Ray project GitHub page with the full stack trace below: https://github.com/ray-project/ray/issues/new/choose
2025-07-22 21:34:22,645	ERROR exceptions.py:81 -- Full stack trace:
Traceback (most recent call last):
  File "/conda_env/lib/python3.12/site-packages/ray/data/exceptions.py", line 49, in handle_trace
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/conda_env/lib/python3.12/site-packages/ray/data/_internal/plan.py", line 439, in execute_to_iterator
    bundle_iter = execute_to_legacy_bundle_iterator(executor, self)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env/lib/python3.12/site-packages/ray/data/_internal/execution/legacy_compat.py", line 55, in execute_to_legacy_bundle_iterator
    bundle_iter = executor.execute(dag, initial_stats=stats)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env/lib/python3.12/site-packages/ray/data/_internal/execution/streaming_executor.py", line 159, in execute
    StatsManager.register_dataset_to_stats_actor(
  File "/conda_env/lib/python3.12/site-packages/ray/data/_internal/stats.py", line 790, in register_dataset_to_stats_actor
    self._stats_actor().register_dataset.remote(
  File "/conda_env/lib/python3.12/site-packages/ray/actor.py", line 676, in remote
    return self._remote(args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/conda_env/lib/python3.12/site-packages/ray/util/tracing/tracing_helper.py", line 422, in _start_span
    return method(self, args, kwargs, *_args, **_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env/lib/python3.12/site-packages/ray/actor.py", line 856, in _remote
    obj_ref = invocation(args, kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env/lib/python3.12/site-packages/ray/actor.py", line 836, in invocation
    return dst_actor._actor_method_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env/lib/python3.12/site-packages/ray/actor.py", line 2059, in _actor_method_call
    object_refs = worker.core_worker.submit_actor_task(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "python/ray/_raylet.pyx", line 3992, in ray._raylet.CoreWorker.submit_actor_task
  File "python/ray/_raylet.pyx", line 4053, in ray._raylet.CoreWorker.submit_actor_task
Exception: Failed to submit task to actor ActorID(4264d583dfdde28efb09616901000000) due to b"Can't find actor 4264d583dfdde28efb09616901000000. It might be dead or it's from a different cluster"

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Metadata

Metadata

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tdataRay Data-related issuesstability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions