-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise the inner exception when parsing nested dataclasses #136
Comments
This is a very good example of why this issue should be addressed. To be honest, I've come across this problem myself.
I really like this helper. Good job!
I don't see any another solution other that changing the generated sequence unpacker code to something like this: def sequence_unpacker(seq):
result = []
for idx, el in enumerate(seq):
try:
result.append(int(el))
except Exception:
raise SomeException(seq, pos=idx) But it will cost us a speed reduction because in the current implementation we are getting
I think we should start from a config option and then see how it goes. If we manage to keep sequence debugability without relying on stack trace and avoid notable impact on performance, we can make this behavior default.
The concept is definitely right and this is a very important issue. If you need any help, let me know. |
@Fatal1ty read the code, indeed adding an index seems like:
for (b), maybe this pattern would be less degrading: def sequence_unpacker(seq):
result = []
try:
for idx, el in enumerate(seq):
result.append(int(el))
except Exception:
raise SomeException(seq, pos=idx) but still a real impact. I'll draft something on one of the upcoming weekends |
I've been enjoying using plugins for JSON Schema lately and I just thought why wouldn't I add serialization / deserialization plugins. On the rights of nonsense I sketched a POC: class SequencePlugin(BasePlugin):
def sequence_unpacker(self, spec: ValueSpec, item_unpacker: Expression):
lines = CodeLines()
self.meta.ensure_object_imported(SequenceItemInvalidError)
name = f"_unpack_sequence_{random_hex()}"
with lines.indent(f"def {name}(seq):"):
lines.append("result = []")
with lines.indent("try:"):
with lines.indent("for idx, item in enumerate(seq):"):
lines.append(f"result.append({item_unpacker})")
lines.append("return result")
with lines.indent("except Exception:"):
lines.append("raise SequenceItemInvalidError(seq, idx, item)")
lines.append(f"setattr({spec.cls_attrs_name}, '{name}', {name})")
exec(lines.as_text(), spec.builder.globals, spec.builder.__dict__)
return f"{spec.cls_attrs_name}.{name}"
def get_unpacker(self, spec: ValueSpec) -> Optional[Expression]:
if ensure_generic_collection_subclass(spec, Sequence):
args = get_args(spec.type)
item_unpacker = spec.unpacker_registry.get(
spec.copy(
type=args[0],
expression="item",
could_be_none=True,
field_ctx=spec.field_ctx.copy(metadata={}),
)
)
name = self.sequence_unpacker(spec, item_unpacker)
return f"{name}({spec.expression})"
decoder = BasicDecoder(Sequence[int], plugins=[SequencePlugin()])
x = decoder.decode([1, 'a', 3])
|
Is your feature request related to a problem? Please describe.
First, this has been discussed in #83 and dismissed. But I'd like to raise the question and propose a solution yet again.
For the context, which is the same as in #135 - I am working with deeply nested structures, millions of them. For some context - some of the serialized trees that I have are 300MB in msgpack! (that's >1GB in memory after deserialization). And they have ONE root.
So when something breaks during deserialization, the current basic functionality is unusable. Even if I could have inspected the 300MB worth of text, finding what caused an error is impossible. The outer context is not helpful, on the contrary it is detrimental. A typical example - I've changed some model and trying to code a migration from the old one.
They way I've fixed it is by using an unwrapped helper: code and doc
It is almost ideal, but there is one thing that can't be done outside of mashumaro - namely, I can't get the index of child if it is part of a sequence. Only the field name. So for model like this (omitting data class and base class for brevity):
if I say pass a string that is not an integer in a leaf Child node 10 levels deep, my helper will tell me the path
children.children.[8 times more].attr
with mashumaro's inner exception. But I won't know the indexes along the way.Describe the solution you'd like
At least for the
InvalidFieldValue
exception add path capturing, including the index of item in a sequence.And then either:
Describe alternatives you've considered
See above
Additional context
I can try my hand at this if the concept seems right. Didn't want to do a PR before we discuss this
The text was updated successfully, but these errors were encountered: