Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_object_state for objects with __getnewargs__ (or ..._ex__), state unnecessary #215

Open
albertz opened this issue Nov 5, 2024 · 10 comments · May be fixed by #217
Open

get_object_state for objects with __getnewargs__ (or ..._ex__), state unnecessary #215

albertz opened this issue Nov 5, 2024 · 10 comments · May be fixed by #217
Assignees

Comments

@albertz
Copy link
Member

albertz commented Nov 5, 2024

In get_object_state, we already do this:

    if hasattr(obj, "__getnewargs_ex__"):
        args = obj.__getnewargs_ex__()
    elif hasattr(obj, "__getnewargs__"):
        args = obj.__getnewargs__()
    else:
        args = None

    ...

    if args is None:
        return state
    else:
        return args, state

Why do we do this? The state is unnecessary when we have args, or not? When __getnewargs__ or __getnewargs_ex__ is implemented, this should cover the complete state of the object.

What cases do we currently have that actually use __getnewargs__/__getnewargs_ex__? I think this is quite rare, or maybe even non-existent?

Note, I currently have a case, where I implemented __getnewargs_ex__ for some own class, and I was confused that Sisyphus apparently anyway checks the other state.

I implemented __getnewargs_ex__ in a way that should be quite stable w.r.t. the hash, however, the internal object state might change in the future. So that is why I actually would want that Sisyphus only considers the __getnewargs_ex__ output when the object has this function.

The main question is if we have any hashes which would break now when we change this. The code in get_object_state would basically change to this:

    if hasattr(obj, "__getnewargs_ex__"):
        return obj.__getnewargs_ex__()
    elif hasattr(obj, "__getnewargs__"):
        return obj.__getnewargs__()

    ...

    return state
@michelwi
Copy link
Contributor

michelwi commented Nov 5, 2024

The main question is if we have any hashes which would break now when we change this.

I can run our hashtest pipeline on a branch that includes these changes and see if anything breaks

@albertz albertz linked a pull request Nov 5, 2024 that will close this issue
@albertz
Copy link
Member Author

albertz commented Nov 5, 2024

The main question is if we have any hashes which would break now when we change this.

I can run our hashtest pipeline on a branch that includes these changes and see if anything breaks

Here you go: #217

(I tested this on a smaller setup of mine, and there no hash changes.)

@michelwi
Copy link
Contributor

michelwi commented Nov 5, 2024

The pipeline failed with changed hashes of some AlignmentJob. I will try to investigate what caused this later. Maybe the CommonRasrParameters?

@michelwi
Copy link
Contributor

michelwi commented Nov 5, 2024

the difference in this setup is

old:

sis_hash_helper(np.float64(70.12))
b'(float64, (tuple, (tuple, (float, 70.12)), (NoneType)))'

new:

sis_hash_helper(np.float64(70.12))
b'(float64, (tuple, (float, 70.12)))'

Funnily using any other dtype fails in both cases...

albertz added a commit that referenced this issue Nov 5, 2024
@albertz
Copy link
Member Author

albertz commented Nov 5, 2024

Hm that's annoying. I changed it slightly, so this here should pass now. Can you try again? But there might be other problems...

Edit Ah but I see that there is a test failing anyway... Not sure what to do about that.

@michelwi
Copy link
Contributor

michelwi commented Nov 5, 2024

Can you try again? But there might be other problems...

The next failing test is with a Job that takes in custom classes.. one of which has __sis_state__ implemented.

@albertz
Copy link
Member Author

albertz commented Nov 5, 2024

Yea ok, so from that, and also the failing test case, I think we can conclude, there are existing cases where such change breaks the hash.

Not sure what we can do about it. The workaround for my custom class is that I don't use __getnewargs__, but instead I define __getstate__ and __setstate__. It's somewhat ugly and annoying to do this, just because of this weird behavior of Sisyphus.

@albertz
Copy link
Member Author

albertz commented Nov 5, 2024

To be a bit more specific, the custom class I am referring to is returnn.util.math.PiecewiseLinear, which looked like this (code):

class PiecewiseLinear:
    """
    Piecewise linear function.
    """

    def __init__(self, values: Dict[Union[int, float], Union[int, float]]):
        self._sorted_items = sorted(values.items())
        self._sorted_keys = numpy.array([x for x, _ in self._sorted_items])
        self._sorted_values = numpy.array([y for _, y in self._sorted_items])

    def __call__(self, x: Union[int, float]) -> Union[int, float]:
        assert x is not None
        steps = self._sorted_keys
        values = self._sorted_values
        return numpy.interp(x, steps, values)

I considered the _... attribs to be internal implementation detail, which might change in the future. So I implemented:

def __getnewargs_ex__(self):
    return (dict(self._sorted_items),), {}

Note, I implemented __getnewargs_ex__, because I intended to extend the class by some optional args, which would be kwargs only, and I can leave the kwargs empty, i.e. it would keep the same hash, even after any such extensions.

This was specifically my goal: I want to be able to extend the class, but I want that it keeps the same hash for the default original behavior.

More specifically, I was adding *, kw_name: Optional[str] = None, ignore_other_kwargs: bool = False to the args, and also corresponding attribs. And I started to use instances of this class directly in my Sisyphus ReturnnConfig.

So my workaround now was to implement it like this:

    def __getstate__(self):
        # Note: I was implementing __getnewargs_ex__, but we cannot use this because of this Sisyphus bug:
        # https://github.com/rwth-i6/sisyphus/issues/215
        kwargs = {"values": dict(self._sorted_items)}
        if self._kw_name is not None:
            kwargs["kw_name"] = self._kw_name
        if self._ignore_other_kwargs:
            kwargs["ignore_other_kwargs"] = True
        return kwargs

    def __setstate__(self, state):
        self.__init__(**state)

This already includes those extensions now, but also allows for further extensions of the class without breaking the hash.

@michelwi
Copy link
Contributor

michelwi commented Nov 5, 2024

If the sole goal here is to handle the hash of Sisyphus nicely, then you could also just implement a custom __sis_state__. But I would understand that having Sisyphus related code in returnn might be undesirable.

Compromise implementation suggestion: in your recipes define a derived class of PiecewiseLinear that adds the implementation..?

@albertz
Copy link
Member Author

albertz commented Nov 5, 2024

It's not the sole goal. I also want that any pickled data (or other serializations of such object) also stays compatible for future RETURNN versions. This is just as important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants