Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent serialization behavior between method of imported class and class created in jupyter notebook? #672

Closed
c3joshzhang opened this issue Jul 31, 2024 · 2 comments
Labels
Milestone

Comments

@c3joshzhang
Copy link

Seems the dict of method for class imported from other modules is ignored during serialization. However, dict is included if the class is created inside the jupyter notebook.
Here is the code to reproduce this:

# my_pkg/__init__.py
# my_pkg/decorator.py
from functools import wraps

VALUES = []

class D:
    def __call__(self, func):
        @wraps(func)
        def wrapped(*args, **kwargs):
            return func(*args, **kwargs)
        wrapped._attached = VALUES
        return wrapped

dec = D()
# my_pkg/test_obj.py
from .decorator import dec

class Obj:
    
    @dec
    def f(self):
        return ...
# jupyter notebook
from my_pkg.test_obj import Obj
from my_pkg.decorator import VALUES
VALUES.extend([1, 2, 3])

import dill
dill.detect.trace(True)

# use imported class method
dill.dumps(Obj().f)

# restart interpreter
import dill
# copied from dumped binary
s = b'\x80\x04\x95[\x00\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\n_load_type\x94\x93\x94\x8c\nMethodType\x94\x85\x94R\x94\x8c\x0fmy_pkg.test_obj\x94\x8c\x05Obj.f\x94\x93\x94h\x06\x8c\x03Obj\x94\x93\x94)\x81\x94\x86\x94R\x94.'
dill.loads(s)._attached # becomes [], instead of [1, 2, 3]

# use method of class directly created inside notebook
import dill
from my_pkg.decorator import dec

class NewObj:
    
    @dec
    def f(self):
        return ...

new_obj = NewObj()
print(hex(id(new_obj.f.__dict__)))

dill.dumps(new_obj.f)

# restart interpreter
import dill
s = b'\x80\x04\x95#\x05\x00\x00\x00\x00\x00\x00\x8c\ndill._dill\x94\x8c\n_load_type\x94\x93\x94\x8c\nMethodType\x94\x85\x94R\x94h\x00\x8c\x10_create_function\x94\x93\x94(h\x00\x8c\x0c_create_code\x94\x93\x94(K\x00K\x00K\x00K\x02K\x04K\x1fC\x0e\x88\x00|\x00i\x00|\x01\xa4\x01\x8e\x01S\x00\x94N\x85\x94)\x8c\x04args\x94\x8c\x06kwargs\x94\x86\x94\x8c-/home/c3/jupyter_root_dir/my_pkg/decorator.py\x94\x8c\x07wrapped\x94K\x07C\x02\x00\x02\x94\x8c\x04func\x94\x85\x94)t\x94R\x94}\x94\x8c\x08__name__\x94\x8c\x08__main__\x94s\x8c\x01f\x94Nh\x00\x8c\x0c_create_cell\x94\x93\x94N\x85\x94R\x94\x85\x94t\x94R\x94}\x94(\x8c\x0b__wrapped__\x94h\x07(h\t(K\x01K\x00K\x00K\x01K\x01KCC\x04d\x01S\x00\x94Nh\x00\x8c\n_eval_repr\x94\x93\x94\x8c\x08Ellipsis\x94\x85\x94R\x94\x86\x94)\x8c\x04self\x94\x85\x94\x8c!/tmp/ipykernel_3822/2005245315.py\x94h\x19K\x06C\x02\x00\x02\x94))t\x94R\x94c__builtin__\n__main__\nh\x19NNt\x94R\x94}\x94}\x94(\x8c\x0f__annotations__\x94}\x94\x8c\x0c__qualname__\x94\x8c\x08NewObj.f\x94u\x86\x94b\x8c\t_attached\x94]\x94(K\x01K\x02K\x03eu}\x94(h4h5h6h7u\x86\x94b\x8c\x08builtins\x94\x8c\x07getattr\x94\x93\x94\x8c\x04dill\x94\x8c\x05_dill\x94\x93\x94\x8c\x08_setattr\x94h=\x8c\x07setattr\x94\x93\x94\x87\x94R\x94h\x1d\x8c\rcell_contents\x94h1\x87\x94R0h\x16(h\x17\x8c\x10my_pkg.decorator\x94\x8c\x07__doc__\x94N\x8c\x0b__package__\x94\x8c\x06my_pkg\x94\x8c\n__loader__\x94\x8c\x1a_frozen_importlib_external\x94\x8c\x10SourceFileLoader\x94\x93\x94)\x81\x94}\x94(\x8c\x04name\x94hJ\x8c\x04path\x94\x8c-/home/c3/jupyter_root_dir/my_pkg/decorator.py\x94ub\x8c\x08__spec__\x94\x8c\x11_frozen_importlib\x94\x8c\nModuleSpec\x94\x93\x94)\x81\x94}\x94(hThJ\x8c\x06loader\x94hR\x8c\x06origin\x94hV\x8c\x0cloader_state\x94N\x8c\x1asubmodule_search_locations\x94N\x8c\r_set_fileattr\x94\x88\x8c\x07_cached\x94\x8cE/home/c3/jupyter_root_dir/my_pkg/__pycache__/decorator.cpython-39.pyc\x94\x8c\r_initializing\x94\x89ub\x8c\x08__file__\x94hV\x8c\n__cached__\x94hc\x8c\x0c__builtins__\x94cbuiltins\n__dict__\n\x8c\x05wraps\x94\x8c\tfunctools\x94hh\x93\x94\x8c\x06VALUES\x94h:\x8c\x01D\x94hJhl\x93\x94\x8c\x03dec\x94hm)\x81\x94u0h\x00\x8c\x0c_create_type\x94\x93\x94(h\x02\x8c\x04type\x94\x85\x94R\x94\x8c\x06NewObj\x94h\x02\x8c\x06object\x94\x85\x94R\x94\x85\x94}\x94(\x8c\n__module__\x94h\x18h\x19h hKN\x8c\r__slotnames__\x94]\x94ut\x94R\x94hEh\x7fh6hu\x87\x94R0)\x81\x94\x86\x94R\x94.'
dill.loads(s)._attached # is actually [1, 2, 3]

when I turn on detect.trace, here is the trace for serializing the method for imported class

┬ Me1: <bound method Obj.f of <my_pkg.test_obj.Obj object at 0x797040b5f400>>
├┬ T1: <class 'method'>
│├┬ F2: <function _load_type at 0x797040bb1700>
││└ # F2 [28 B]
│└ # T1 [45 B]
├┬ F2: <function Obj.f at 0x797040b4ee50>
│└ # F2 [28 B]
├┬ T4: <class 'my_pkg.test_obj.Obj'>
│└ # T4 [10 B]
└ # Me1 [90 B]

and this is for class created inside jupyter notebook, inside which I can find the dict for new_obj.f: "0x7970414417c0" appears in the trace, which differs the serialization behavior for imported class method.

┬ Me1: <bound method NewObj.f of <__main__.NewObj object at 0x797040b80940>>
├┬ T1: <class 'method'>
│├┬ F2: <function _load_type at 0x797040bb1700>
││└ # F2 [28 B]
│└ # T1 [45 B]
├┬ F1: <function NewObj.f at 0x797040b663a0>
│├┬ F2: <function _create_function at 0x797040bb1790>
││└ # F2 [23 B]
│├┬ Co: <code object wrapped at 0x797040b4fa80, file "/home/c3/jupyter_root_dir/my_pkg/decorator.py", line 7>
││├┬ F2: <function _create_code at 0x797040bb1820>
│││└ # F2 [19 B]
││└ # Co [150 B]
│├┬ D2: <dict object at 0x797040b819c0>
││└ # D2 [25 B]
│├┬ Ce2: <cell at 0x797040b80970: function object at 0x797040b66550>
││├┬ F2: <function _create_cell at 0x797040bb5040>
│││└ # F2 [19 B]
││└ # Ce2 [24 B]
│├┬ D2: <dict object at 0x7970414417c0>
││├┬ F1: <function NewObj.f at 0x797040b66550>
│││├┬ Co: <code object f at 0x797040b60b30, file "/tmp/ipykernel_3913/2005245315.py", line 6>
││││├┬ Si: Ellipsis
│││││├┬ F2: <function _eval_repr at 0x797040bb54c0>
││││││└ # F2 [17 B]
│││││└ # Si [32 B]
││││└ # Co [118 B]
│││├┬ D1: <dict object at 0x7970503da980>
││││└ # D1 [22 B]
│││├┬ D2: <dict object at 0x797041431cc0>
││││└ # D2 [2 B]
│││├┬ D2: <dict object at 0x797040b62980>
││││├┬ D2: <dict object at 0x797040b59600>
│││││└ # D2 [2 B]
││││└ # D2 [50 B]
│││└ # F1 [206 B]
││└ # D2 [246 B]
│├┬ D2: <dict object at 0x797040bc6080>
││└ # D2 [12 B]
│├┬ M2: <module 'dill._dill' from '/opt/conda/envs/py-data-plot-client-ipython/lib/python3.9/site-packages/dill/_dill.py'>
││└ # M2 [17 B]
│├┬ T4: <class '_frozen_importlib_external.SourceFileLoader'>
││└ # T4 [50 B]
│├┬ D2: <dict object at 0x797040b50640>
││└ # D2 [68 B]
│├┬ T4: <class '_frozen_importlib.ModuleSpec'>
││└ # T4 [35 B]
│├┬ D2: <dict object at 0x797040b50700>
││└ # D2 [192 B]
│├┬ D4: <dict object at 0x797057409e40>
││└ # D4 [19 B]
│├┬ F2: <function wraps at 0x797056273dc0>
││└ # F2 [16 B]
│├┬ T4: <class 'my_pkg.decorator.D'>
││└ # T4 [6 B]
│└ # F1 [1 MiB]
├┬ T2: <class '__main__.NewObj'>
│├┬ F2: <function _create_type at 0x797040bb1670>
││└ # F2 [19 B]
│├┬ T1: <class 'type'>
││└ # T1 [13 B]
│├┬ T1: <class 'object'>
││└ # T1 [15 B]
│├┬ D2: <dict object at 0x797040b5cac0>
││└ # D2 [44 B]
│└ # T2 [119 B]
└ # Me1 [1 MiB]

shoud this be consistent and __dict__ is always included?

@c3joshzhang
Copy link
Author

🥲after more invesitgation, it turns out functools.wraps will change the serialization behavior and once I remove functools.wraps it works as expected.

@mmckerns
Copy link
Member

mmckerns commented Aug 1, 2024

Thanks for reporting. I should mention a few things to you: (1) when functions are serialized, they will include globals as a dependency... and you can change the serialization behavior of globals with dill.settings; (2) the serialization of objects that depend on globals is handled differently depending if they are in __main__ or not... and a jupyter notebook cell is technically not __main__, it's a local namespace that, behind the sciences, passes objects defined in a cell up to __main__... so there should be differences between sessions in a notebook and those not in a notebook.

@mmckerns mmckerns added this to the dill-0.3.9 milestone Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants