-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Description
What happened + What you expected to happen
If we have an existing ray cluster (e.g. setup using cli ray start --num-cpus 64) and we run two Ray Data pipelines back to back - each pipeline using ray.init(), processing a dataset with map_batches, and followed by ray.shutdown() - the second pipeline fails with an error indicating Ray cannot submit a task to an actor because it "can't find actor ... it might be dead or it's from a different cluster."
The use case is that we want a cluster to be instantiated independent of our pipelines, and each of our pipelines might be executed one after the other (with possibly different environment variables, so you can imagine the second pipeline sepcifies runtime_env={"env_vars": {"XYZ": "VALUE"}} while the first doesn't)
Versions / Dependencies
('2.48.0', '03491225d59a1ffde99c3628969ccf456be13efd')
Reproduction script
import os
import ray
# start a ray cluster using `ray start`
os.environ["RAY_ADDRESS"] = "127.0.0.1:8265"
class F:
def __call__(self, x):
return x
# First pipeline
ray.init(ignore_reinit_error=True)
ray.data.range(1000).map_batches(F, concurrency=(1, 4), num_cpus=1,).take_all()
ray.shutdown()
# Second pipeline
ray.init(ignore_reinit_error=True)
ray.data.range(1000).map_batches(F, concurrency=(1, 4), num_cpus=1,).take_all()
ray.shutdown()Error
2025-07-22 21:34:22,645 ERROR exceptions.py:73 -- Exception occurred in Ray Data or Ray Core internal code. If you continue to see this error, please open an issue on the Ray project GitHub page with the full stack trace below: https://github.com/ray-project/ray/issues/new/choose
2025-07-22 21:34:22,645 ERROR exceptions.py:81 -- Full stack trace:
Traceback (most recent call last):
File "/conda_env/lib/python3.12/site-packages/ray/data/exceptions.py", line 49, in handle_trace
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/conda_env/lib/python3.12/site-packages/ray/data/_internal/plan.py", line 439, in execute_to_iterator
bundle_iter = execute_to_legacy_bundle_iterator(executor, self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/conda_env/lib/python3.12/site-packages/ray/data/_internal/execution/legacy_compat.py", line 55, in execute_to_legacy_bundle_iterator
bundle_iter = executor.execute(dag, initial_stats=stats)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/conda_env/lib/python3.12/site-packages/ray/data/_internal/execution/streaming_executor.py", line 159, in execute
StatsManager.register_dataset_to_stats_actor(
File "/conda_env/lib/python3.12/site-packages/ray/data/_internal/stats.py", line 790, in register_dataset_to_stats_actor
self._stats_actor().register_dataset.remote(
File "/conda_env/lib/python3.12/site-packages/ray/actor.py", line 676, in remote
return self._remote(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/conda_env/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/conda_env/lib/python3.12/site-packages/ray/util/tracing/tracing_helper.py", line 422, in _start_span
return method(self, args, kwargs, *_args, **_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/conda_env/lib/python3.12/site-packages/ray/actor.py", line 856, in _remote
obj_ref = invocation(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/conda_env/lib/python3.12/site-packages/ray/actor.py", line 836, in invocation
return dst_actor._actor_method_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/conda_env/lib/python3.12/site-packages/ray/actor.py", line 2059, in _actor_method_call
object_refs = worker.core_worker.submit_actor_task(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "python/ray/_raylet.pyx", line 3992, in ray._raylet.CoreWorker.submit_actor_task
File "python/ray/_raylet.pyx", line 4053, in ray._raylet.CoreWorker.submit_actor_task
Exception: Failed to submit task to actor ActorID(4264d583dfdde28efb09616901000000) due to b"Can't find actor 4264d583dfdde28efb09616901000000. It might be dead or it's from a different cluster"Issue Severity
Medium: It is a significant difficulty but I can work around it.