-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow discrimination based on a key that is not part of the model #135
Comments
Hey @mishamsk First of all, congratulations on your project publication! I'll consider using it when I need to do some parsing stuff. When I was developing discriminator feature, I thought about adding an attribute to for variant in iter_all_subclasses(__main__.Node):
try:
variants_map[variant.__dict__['__type']] = variant
except KeyError:
continue By introducing a tag generator function we can change it. For example, if we have for variant in iter_all_subclasses(__main__.Node):
try:
variants_map[variant_tag_getter(variant)] = variant
except KeyError:
continue I'm bad at naming but I guess that The question is, do we need or will we need functionality of automatic adding tags to all variants on serialization, so we could avoid using What do you think? Maybe I'm missing something and you have your own vision of solving this issue. |
@Fatal1ty first thanks for the wishes! it took me way to much time;-) rgd your design suggestion, I have a couple of thoughts. First, rgd my specific use case. I think it's a distinct usage pattern from a "model a messaging protocol and parse". Instead it's more of a "pickle" use case. I am controlling both serialization and deserialization, so do not care about validations or trying to decode data originating from some external system. I have a hunch that only this kind of usage pattern where one can expect a literal class name in serialized data. None of the public specs, except the ones used for interoperability, would ever dictate to store a literal class name in serialized data, because class name is obviously implementation specific. So if you think that mashumaro is commonly used for such things, it may be worth implementing explicitly, instead of a "generic" approach. But you'd know better if this is the case. My usage pattern may be rare. Rgd your proposal for the universal solution. I'd definitely say that whatever it will be, it's going to be the harder part to grasp for users. I had to read mashumaro's code to understand where this "getter" stuff will pop-up. I am not sure if you need a different argument for the Discriminator though. Why not just inspect the I do think that using |
Assuming we are using JSON serialization format, we need two things:
Currently we can configure only first item through Discriminator "field" attribute. The second item is hardcoded: mashumaro/mashumaro/core/meta/types/unpack.py Lines 347 to 350 in c0a4cf8
Semantically this function (I called it def variant_tagger_fn(cls):
return cls.__dict__[discriminator.field] In your case you don't have class level attributes containing unique literals, so you could use the following function instead to get dynamic tag values: def variant_tagger_fn(cls):
return cls.__name__ # or something more practical
msgspec has its |
Oh, sorry, I wasn't thinking straight. Yes this makes total sense. Did you consider keeping only the current field attribute, but allowing it to be a tuple? I.e.:
I may be biased, but the word sample code - slotted dataclasses do not have from dataclasses import dataclass
from typing import ClassVar
@dataclass(slots=True)
class Test:
x: ClassVar[int] = 0
y: int = 0
print("__dict__" in dir(Test)) # False
print("__dict__" in dir(Test())) # False |
@mishamsk Sorry for delay, I was busy at work.
It will work but I don't see any point to make one argument to accept different value types instead of using two arguments. It will also be complicated to explain the use of this parameter in the documentation. Let's make it simple.
I agree that it doesn't sound very well but I didn't manage to find a better word. I do think this complexity has roots in using discriminator for both subclass deserialization and for discriminated union deserialization. If it was only for subclass deserilizaiton then I would called it
I grabbed my head and almost believed it, but it made me go to check. Fortunately for us, your assumption is only partially correct. The truth is slotted dataclass instances don't have print("__dict__" in dir(Test)) # False
print(Test.__dict__) # {'__module__': '__main__', '__annotations__': {'x': typing.ClassVar[int], ... You might be curious why I even rely on |
nothing to apologize for! appreciate additional insights. Sorry for the false alarm, didn’t know that dir had completely different behavior based on what’s being inspected. Regarding the naming, as I’ve said, I do not have a strong opinion. You’ve build & maintained this library all along, so you’ll know better what works for the community. The only thing I can add, is that I’ve been trying to find other libs that supported auto-subclass (de)serialization, and couldn’t find any. Seems odd to me, that there is nothing in-between pickle and strictly typed validation/serialization (pydantic, msgspec, cattrs etc). So having such a feature in mashumaro would further set it apart! Also, this change seems relatively simple. Probably even simpler than #136. I am on vacation, so am away from coding for the sake of the family, but I may get some time to draft a PR when I am back if you haven’t done it yourself by then. |
It was easy to implement :) |
@Fatal1ty nice! you are definitely moving faster than I do;-) |
Is your feature request related to a problem? Please describe.
I'am using mashumaro as (de)serialization backend of my AST library
What this means is that I (de)serialize huge (millions of nodes) nested dataclasses with fields usually having a general type that may have tens and sometimes hunderds subclasses. E.g. borrowing from the example in my own docs:
Expr
can (or rather has in real life) hundreds of subclasses.Thus it is impractical to let mashumaro try discriminating the actual type by iteratively trying every subclass during deserialization. Also, it will be incorrect in some cases because two subclasses may have the same structure, but I need exactly the same type as it was written during serialization.
I am already handling this via my own extension: see code
My custom implementation adds
__type
key during serialization which is read during deserialization and then dropped. If it is not available, it falls back to the default logic.However, this way I can't use some of the other mashumaro features, such as annotations in _serialize/_deserialize, and I have to resort to "dirty" class state to pass additional options (what has been discussed in #100)
Describe the solution you'd like
I suggest adding an option to the discriminator feature to allow using keys that are not part of the data class itself.
Describe alternatives you've considered
See above my current implementation
The text was updated successfully, but these errors were encountered: