Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Should catch the decref error when the session has already been destoryed #399

Open
ChengjieLi28 opened this issue Apr 21, 2023 · 0 comments
Labels
bug Something isn't working gpu

Comments

@ChengjieLi28
Copy link
Contributor

Note that the issue tracker is NOT the place for general support. For
discussions about development, questions about usage, or any general questions,
contact us on https://discuss.xorbits.io/.
Reproduce:

  1. init a cluster
import xorbits

xorbits.init(cuda_devices=[0])
  1. Run a task in another session, then destory session explicitly
import xorbits

xorbits.init('<endpoint above>')

import xorbits.numpy as np

np.random.rand(10000, 10000).to_gpu()

xorbits.shutdown()
  1. Exit the process that you run the task above. Then the cluster (tornado) will raise an error:
2023-04-21 07:58:04,500 xorbits._mars.services.web.core 111327 ERROR    ActorNotExist when handling request with LifecycleWebAPIHandler.decref_tileables
Traceback (most recent call last):
  File "/home/lichengjie/workspace/xorbits/python/xorbits/_mars/services/web/core.py", line 69, in wrapped
    res = await func(self, *args, **kwargs)
  File "/home/lichengjie/workspace/xorbits/python/xorbits/_mars/services/lifecycle/api/web.py", line 39, in decref_tileables
    await oscar_api.decref_tileables(tileable_keys, counts=counts)
  File "/home/lichengjie/workspace/xorbits/python/xorbits/_mars/services/lifecycle/api/oscar.py", line 108, in decref_tileables
    return await self._lifecycle_tracker_ref.decref_tileables(tileable_keys)
  File "xoscar/core.pyx", line 251, in xoscar.core.LocalActorRef.__getattr__
xoscar.errors.ActorNotExist: Actor b'2IC9l6dhZaChiD31uVp7EYKq_lifecycle_tracker' does not exist
2023-04-21 07:58:04,500 tornado.access 111327 ERROR    500 POST /api/session/2IC9l6dhZaChiD31uVp7EYKq/lifecycle?action=decref_tileables (127.0.0.1) 1.18ms

This error should not raise out which may lead to some confusions.

@XprobeBot XprobeBot added bug Something isn't working gpu labels Apr 21, 2023
@XprobeBot XprobeBot added this to the v0.3.0 milestone Apr 21, 2023
@XprobeBot XprobeBot modified the milestones: v0.3.0, v0.3.1 May 6, 2023
@XprobeBot XprobeBot modified the milestones: v0.3.1, v0.3.2 May 17, 2023
@XprobeBot XprobeBot modified the milestones: v0.3.2, v0.4.1 Jul 1, 2023
@XprobeBot XprobeBot modified the milestones: v0.4.1, v0.4.3 Jul 11, 2023
@XprobeBot XprobeBot modified the milestones: v0.4.3, v0.5.1 Jul 28, 2023
@XprobeBot XprobeBot modified the milestones: v0.5.1, v0.5.2 Aug 14, 2023
@XprobeBot XprobeBot modified the milestones: v0.5.2, v0.6.0, v0.6.1 Sep 8, 2023
@XprobeBot XprobeBot modified the milestones: v0.6.1, v0.6.2, v0.6.3 Sep 15, 2023
@XprobeBot XprobeBot modified the milestones: v0.6.3, v0.7.0 Sep 25, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.0, v0.7.1 Oct 23, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.1, v0.7.2 Nov 21, 2023
@XprobeBot XprobeBot removed this from the v0.7.2 milestone Jan 5, 2024
@XprobeBot XprobeBot added this to the v0.7.3 milestone Jan 5, 2024
@XprobeBot XprobeBot modified the milestones: v0.7.3, v0.7.4 Aug 22, 2024
@luweizheng luweizheng removed this from the v0.7.4 milestone Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gpu
Projects
None yet
Development

No branches or pull requests

3 participants