Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enum-like classes cant be used as dictionary keys #960

Open
bruno-f-cruz opened this issue Jun 3, 2024 · 10 comments
Open

Enum-like classes cant be used as dictionary keys #960

bruno-f-cruz opened this issue Jun 3, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@bruno-f-cruz
Copy link
Collaborator

Not sure if there is a different intended use...

To reproduce:

from aind_data_schema_models.platforms import Platform
from pydantic import BaseModel


class Foo(BaseModel):
    platform: dict[Platform.ONE_OF, str]


foo = Foo(platform={Platform.BEHAVIOR: "foo"})
print(foo)

bar = foo.model_dump()
@bruno-f-cruz
Copy link
Collaborator Author

Tracking pydantic/pydantic#9453

@bruno-f-cruz
Copy link
Collaborator Author

This is actually more serious that I initially gave it credit for. This pattern completely prevents users from using the model_dump method:

import examples
from examples.fip_ophys_rig import rig
from aind_data_schema_models.modalities import Modality
rig.modalities = set([Modality.BEHAVIOR])
rig.model_dump()

raises the following error:

raceback (most recent call last):
  File ".\aind-data-schema\deserialization_order.py", line 27, in <module>     
    rig.model_dump()
  File ".\aind-data-schema\.venv\Lib\site-packages\pydantic\main.py", line 314, in model_dump
    return self.__pydantic_serializer__.to_python(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: unhashable type: 'dict'

@jtyoung84
Copy link
Collaborator

It seems like an issue with the pydantic serializer. Even this causes issues:

from typing import Set
from pydantic import BaseModel, ConfigDict
from dataclasses import dataclass

@dataclass(frozen=True)
class Cell:
    value: str

# This works
set([Cell(value="a"), Cell(value="b")])

class WeightedSeeds(BaseModel):
    seeds: Set[Cell]

foo = WeightedSeeds(seeds=set([Cell(value="a"), Cell(value="b")]))

# This raises unhashable type errors
foo.model_dump()

@bruno-f-cruz
Copy link
Collaborator Author

bruno-f-cruz commented Jun 7, 2024

Possible solutions

1- Simplify everything and go back to native Enum. This is the preferred solution as it is rather isomorphic across programming languages and json.

2- write our own de(serializers) for a base class to be inherited from all complex Enum types. This strategy has a few problems:

  • We would need to serialize the class a a string (likely using the name property`) and deserialize by looking into the allowed types and finding the correct class to deserialize to. This means that all types must be discriminated unions, with the discriminator being the same as the property being serialized.
  • One must ensure that all properties of all objects that inherit from this base class are Literal and read-only (i.e. provided at construction type). Otherwise any sort of custom deserialization will inevitably be lossy;

3- Keep a service somewhere that takes a string and returns these complex objects. This is potentially very interesting as it would allow the schema to be decoupled from the specifics of this database. For instance, the schema would only have to keep track of all possible manufacturers without having to keep track of abbreviations, numeric IDs and other properties that might otherwise change with time.

@bruno-f-cruz
Copy link
Collaborator Author

@saskiad @dyf looping you in too. I think this is a rather annoying regression bug (since these started off as an Enum). Especially when other AIND tools like the watchdog could benefit greatly from this change.

I will still look into the second solution but it just doesn't feel right. We should really just make all these Enums. It would also solve the weird pattern of typing the properties with Type.ONE_OF but using a different instance (Type.Value).

@jtyoung84
Copy link
Collaborator

jtyoung84 commented Jun 8, 2024

Enums would be nice. I can't recall anymore exactly what the issue was that made me switch, but enums of objects were creating headaches somewhere. I do remember now that when we upgraded to v2, my original enum metadata class broke. I went with this solution. I'm open to suggestions for improvements since this has always been troublesome.

I also remember now too, that most of the docs I was reading were using tagged unions for this kind of situation, but not quite the same situation.

@bruno-f-cruz
Copy link
Collaborator Author

bruno-f-cruz commented Jun 8, 2024

@jtyoung84 It's late and I might be missing something very obvious, but what about this?

from aind_data_schema_models.modalities import Modality

from pydantic import BaseModel, Field
from enum import Enum
from typing import Dict, Set


class Modalities(Enum):
    BEHAVIOR = Modality.BEHAVIOR
    OPHYS = Modality.POPHYS
    ICEPHYS = Modality.ICEPHYS


class SubSetOfModalities(Enum):
    BEHAVIOR = Modality.BEHAVIOR
    OPHYS = Modality.POPHYS


class Foo(BaseModel):
    modality: Dict[Modalities, str] = Field(default={Modalities.BEHAVIOR: "foo"})
    stuff: Set[Modalities] = Field(default={Modalities.BEHAVIOR, Modalities.OPHYS})
    another_subset: SubSetOfModalities = Field(default=SubSetOfModalities.BEHAVIOR, validate_default=True)


foo = Foo()
print(foo)
print(foo.model_dump())

This is just a proof of concept, we could improve a bit the subtyping I think. I think it would be so much cleaner for users too!

To be clear, I still think Enum of strings would be far easier to maintain.

@bruno-f-cruz
Copy link
Collaborator Author

I have been playing with this a bit more, I think we can make this change be fully backward compatible with users code. The trick would be to define the subsets of Modalities as Literals (This is the python standard in most libraries anyway, so i think it is actually a good idea...)

The previous example would thus become:

from aind_data_schema_models.modalities import Modality

from pydantic import BaseModel, Field
from enum import Enum
from typing import Dict, Set, Literal


class Modalities(Enum):
    BEHAVIOR = Modality.BEHAVIOR
    OPHYS = Modality.POPHYS
    ICEPHYS = Modality.ICEPHYS


class Foo(BaseModel):
    modality: Dict[Modalities, str] = Field(default={Modalities.BEHAVIOR: "foo"})
    stuff: Set[Modalities] = Field(default={Modalities.BEHAVIOR, Modalities.OPHYS})
    another_subset: Literal[Modalities.BEHAVIOR, Modalities.OPHYS] = Field(..., validate_default=True)


foo = Foo(another_subset=Modalities.BEHAVIOR)  # type hint: "another_subset: Literal[Modalities.BEHAVIOR, Modalities.OPHYS]"
print(foo)
print(foo.model_dump())

@bruno-f-cruz
Copy link
Collaborator Author

Ignore the previous two suggestions. I dont think we want to go down this route as both the json-schema and round-trip (de)serialization breaks.

print(foo.model_validate_json(foo.model_dump_json()))

leads to:

pydantic_core._pydantic_core.ValidationError: 4 validation errors for Foo
modality.name='Behavior' abbreviation='behavior'.[key]
  Input should be Behavior(name='Behavior', abbreviation='behavior'), POphys(name='Planar optical physiology', abbreviation='ophys') or Icephys(name='Intracellular electrophysiology', abbreviation='icephys') [type=enum, input_value="name='Behavior' abbreviation='behavior'", input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/enum
stuff.0
  Input should be Behavior(name='Behavior', abbreviation='behavior'), POphys(name='Planar optical physiology', abbreviation='ophys') or Icephys(name='Intracellular electrophysiology', abbreviation='icephys') [type=enum, input_value={'name': 'Planar optical ...'abbreviation': 'ophys'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/enum
stuff.1
  Input should be Behavior(name='Behavior', abbreviation='behavior'), POphys(name='Planar optical physiology', abbreviation='ophys') or Icephys(name='Intracellular electrophysiology', abbreviation='icephys') [type=enum, input_value={'name': 'Behavior', 'abbreviation': 'behavior'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/enum
another_subset
  Input should be <Modalities.BEHAVIOR: Behavior(name='Behavior', abbreviation='behavior')> or <Modalities.OPHYS: POphys(name='Planar optical physiology', abbreviation='ophys')> [type=literal_error, input_value={'name': 'Behavior', 'abbreviation': 'behavior'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/literal_error

It seems pydantic cant automatically convert the json object instance to the corresponding python class. This could be done with an extra validator, but I still think we should go with Enum of type string to keep the solution simple.

@jtyoung84
Copy link
Collaborator

@bruno-f-cruz I'm going to assign this to @dbirman to see if this is a still an issue.

@jtyoung84 jtyoung84 assigned dbirman and unassigned jtyoung84 Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants