Refactor `ModelBuilder` and `RandomBuilder` #971

jrycw · 2024-03-22T04:45:24Z

I encountered ModelBuilder._randomize_attribute and initially found the numerous if-elif-else checks puzzling. However, further investigation revealed that certain factories necessitate additional information from the column. I attempted to consolidate these logics into RandomBuilder._build, aiming for improved clarity. Nevertheless, the final code may not be as pristine as desired. I'm looking forward to hearing the community's thoughts on whether this refactoring improves the codebase.

codecov-commenter · 2024-03-22T04:48:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.82%. Comparing base (91d6238) to head (ad763c3).
Report is 4 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #971      +/-   ##
==========================================
+ Coverage   92.78%   92.82%   +0.03%     
==========================================
  Files         108      108              
  Lines        8182     8222      +40     
==========================================
+ Hits         7592     7632      +40     
  Misses        590      590

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dantownsend

This is great - thanks 👍

ModelBuilder is something which started fairly simple, but as more and more edge cases were discovered it has become quite complex.

I agree that your solution is cleaner and easier to follow.

I left a few comments.

piccolo/testing/random_builder.py

dantownsend · 2024-03-22T15:50:23Z

piccolo/testing/random_builder.py

+            random_value_callable = partial(
+                cls.next_list,
+                mapper[t.cast(Array, column).base_column.value_type],
+            )


I wonder if anything besides an Array could end up in this block.

It wonder if we can add list to mapper in a clean way?

The handling of list is quite complex, involving two aspects: the simple callable, such as RandomBuilder.next_bool, and additional logic that requires knowledge of the column to manufacture another callable. Since list relies on obtaining the type from the mapper, we cannot determine the exact type until the last moment.

You're right - arrays are always tricky edge cases.

Maybe we could add a check if isinstance(column, Array)?

See my latest comment regarding this issue.

piccolo/testing/random_builder.py

dantownsend · 2024-03-22T15:54:15Z

piccolo/testing/random_builder.py



 class RandomBuilder:
+    @classmethod
+    def _build(cls, column: Column) -> t.Any:


What do you think about calling this something like get_value_for_column? Do you think it should be a private method?

I initially considered treating it as a private method, but also wanted to find a way to provide a mechanism for registering a new type.

That would be nice.

piccolo/testing/random_builder.py

jrycw · 2024-03-22T17:34:34Z

This is great - thanks 👍

ModelBuilder is something which started fairly simple, but as more and more edge cases were discovered it has become quite complex.

I agree that your solution is cleaner and easier to follow.

I left a few comments.

@dantownsend I completely agree with you; the task turned out to be more complex than I initially anticipated. Building the mapper required careful handling of many cases, such as using partial, dealing with the return from get, and modifying the current next-ish method with default values to becoming a callable. While this PR is functional at the moment, I acknowledge that it may become challenging to maintain in the future. I am currently exploring alternative ideas and hope to provide another version soon.

dantownsend · 2024-03-22T19:04:59Z

@dantownsend I completely agree with you; the task turned out to be more complex than I initially anticipated. Building the mapper required careful handling of many cases, such as using partial, dealing with the return from get, and modifying the current next-ish method with default values to becoming a callable. While this PR is functional at the moment, I acknowledge that it may become challenging to maintain in the future. I am currently exploring alternative ideas and hope to provide another version soon.

I think there's a lot of potential in this approach. Feel free to play around with other ideas, but this is a nice start.

jrycw · 2024-03-22T19:14:15Z

@dantownsend I completely agree with you; the task turned out to be more complex than I initially anticipated. Building the mapper required careful handling of many cases, such as using partial, dealing with the return from get, and modifying the current next-ish method with default values to becoming a callable. While this PR is functional at the moment, I acknowledge that it may become challenging to maintain in the future. I am currently exploring alternative ideas and hope to provide another version soon.

I think there's a lot of potential in this approach. Feel free to play around with other ideas, but this is a nice start.

I've just pushed another version. This version introduces a hook for users to register their own random type. I'll review your comments tomorrow as it's already midnight here in Asia.

jrycw · 2024-03-23T06:32:57Z

The third version may be a bit easier to understand:

RandomBuilder now offers a public API called get_mapper, enabling users to access the default random mapper.
Forming the type-callable pairs into a single dictionary involves several steps, which are encapsulated in the public API ModelBuilder.get_registry:
- Initially, we import information from RandomBuilder.get_mapper, a one-time operation.
- Next, we address situations where callables may require information from the column in ModelBuilder._get_local_mapper.
- We then incorporate user-registered types into ModelBuilder._get_other_mapper.
- Finally, we integrate the logic for list into reg = { **default_mapper, **cls._get_local_mapper(column), **cls._get_other_mapper(column)}, returning reg as a MappingProxyType.
I made an effort to comprehend the relationship between list and Array, but I'm afraid I may have gotten a bit lost in the codebase. I'm uncertain about the extent to which isinstance(column, Array) can handle the edge case effectively.
Handling enum.Enum remains a challenge.
I made a minor adjustment in ModelBuilder._build, where defaults = defaults or { } seems clearer to me.
The naming of variables and functions could be improved; I welcome any advice you may have on this matter.
Providing a method for users to unregister their custom types might be handy:

    @classmethod
    def unregister_types(cls) -> None:
        cls.__OTHER_MAPPER.clear()

jrycw · 2024-03-23T12:41:08Z

piccolo/testing/model_builder.py

+        precision, scale = column._meta.params.get("digits") or (4, 2)
+        local_mapper[Decimal] = partial(
+            RandomBuilder.next_decimal, precision, scale
+        )


If RandomBuilder.next_decimal incorporates default values for precision and scale (like this PR), the logic can be simplified to:

if precision_scale := column._meta.params.get("digits"): local_mapper[Decimal] = partial( RandomBuilder.next_decimal, *precision_scale )

dantownsend · 2024-03-23T22:15:02Z

piccolo/testing/model_builder.py

-                cls.__DEFAULT_MAPPER[base_type]() for _ in range(length)
-            ]
-        elif column._meta.choices:
+        reg = cls.get_registry(column)


This can probably go into the else block, as we don't use it if the column has choices.

That's a valid observation. It seems that the presence of reg is a result of the requirements in the previous version, where it was needed for multiple elif-else blocks.

I'm not sure if this is a good idea.

@classmethod def _randomize_attribute(cls, column: Column) -> t.Any: reg = cls.get_registry(column) random_value = reg.get(enum.Enum, reg[column.value_type])() if isinstance(column, (JSON, JSONB)): return json.dumps({"value": random_value}) return random_value @classmethod def _get_local_mapper(cls, column: Column) -> t.Dict[t.Type, t.Callable]: ... if _choices := column._meta.choices: local_mapper[enum.Enum] = partial( RandomBuilder.next_enum, _choices) return local_mapper

dantownsend · 2024-03-23T22:17:33Z

piccolo/testing/model_builder.py

+        if column.value_type == list:
+            reg[list] = partial(
+                RandomBuilder.next_list,
+                reg[t.cast(Array, column).base_column.value_type],


I don't want to make things too insane, but multidimensional arrays are possible:

Array(Array(Integer())

I wonder what would happen here in that situation?

Thank you for bringing this situation to my attention. I was curious about the behavior in our current codebase, so I conducted a quick test. It seems that the current implementation will throw a KeyError for this situation (please correct me if I'm mistaken).

This might be a bit off-topic, but I wanted to try out this behavior in the Piccolo playground, and it doesn't seem to be working. The code works fine if I just launch a terminal and enter the shell.

piccolo playground run --engine=sqlite3

In [1]: from piccolo.table import Table In [2]: from piccolo.columns import Array, BigInt In [3]: class MyTable(Table): ...: my_column = Array(Array(BigInt())) ...: --------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[3], line 1 ----> 1 class MyTable(Table): 2 my_column = Array(Array(BigInt())) Cell In[3], line 2, in MyTable() 1 class MyTable(Table): ----> 2 my_column = Array(Array(BigInt())) NameError: name 'Array' is not defined

@jrycw This happens because the Array and BigInt columns are not imported into the Playground application. If you patch the local Piccolo installation and add these two columns, everything works fine. Maybe should add all the possible columns to import and that would solve the problem..

@sinisaos Thank you very much for identifying the source of the issue and providing the solution. It's working now.

You're right that the current implementation of ModelBuilder doesn't handle multidimensional arrays, so I wouldn't worry about it if it's a tricky fix. Array columns now have two methods to help with this kind of thing: _get_dimensions and _get_inner_value_type.

piccolo/testing/model_builder.py

dantownsend · 2024-03-23T22:28:11Z

piccolo/testing/model_builder.py

+    @classmethod
+    def register_type(cls, typ: t.Type, callable_: t.Callable) -> None:
+        cls.__OTHER_MAPPER[typ] = callable_
+
+    @classmethod
+    def unregister_type(cls, typ: t.Type) -> None:


Being able to register custom types is cool, but I wonder about the main use cases.

You can specify defaults at the moment:

await ModeBuilder.build(MyTable, defaults={MyTable.some_column: "foo"})

I'm not against being able to override how types are handled, but we just need to articulate to users when it's appropriate vs using defaults.

Thank you for bringing up this question, it prompted me to reflect on the code and its implications.

The main difference between using default and register_type lies in their respective purposes:

Utilizing default is suitable when users generally find our provided random logic satisfactory, but they require a hardcoded value for a specific column on a one-time basis.

On the other hand, employing register_type is appropriate when users desire a custom implementation for a type, effectively overwriting our default random logic for that specific type.
Here are three distinct use cases:

(1) Types not provided by us: For instance, the type like next_decimal is not available in the current release version, but users can implement their own logic for the type and inject it into the ModelBuilder.

(2) User preference for specific logic: In scenarios where we introduce new features, such as the shiny next_decimal logic (returning decimal.Decimal if column.value_type is decimal.Decimal), users may prefer the previous implementation or have specific requirements. With register_type, they have the flexibility to override the default behavior.

(3) Unanticipated user cases: This aspect is particularly valuable for registration. For example, consider a user who initially builds a successful e-commerce platform in the UK using Piccolo. Later, they expand into Asia and encounter legal requirements necessitating the storage of customer names in local languages. If ModelBuilder does not support non-English characters, users can register their own implementations to address this issue.

A draft test for situations (1) and (2) might look like this:

class TableWithDecimal(Table): numeric = Numeric() numeric_with_digits = Numeric(digits=(4, 2)) decimal = Decimal() decimal_with_digits = Decimal(digits=(4, 2)) class TestModelBuilder(unittest.TestCase): ... def test_registry_overwritten1(self): table = ModelBuilder.build_sync(TableWithDecimal) for key, value in table.to_dict().items(): if key != "id": self.assertIsInstance(value, decimal.Decimal) def fake_next_decimal(column: Column) -> float: """will return `float` instead of `decimal.Decimal`""" precision, scale = column._meta.params["digits"] or (4, 2) return RandomBuilder.next_float( maximum=10 ** (precision - scale), scale=scale ) ModelBuilder.register_type(decimal.Decimal, fake_next_decimal) overwritten_table = ModelBuilder.build_sync(TableWithDecimal) for key, value in overwritten_table.to_dict().items(): if key != "id": self.assertIsInstance(value, float)

A draft test for situations (3) might look like this:

class TestModelBuilder(unittest.TestCase): ... def test_registry_overwritten2(self): choices = "一二三" # Chinese characters def next_str(length: int = 3) -> str: # Chinese names often consist of three Chinese characters return "".join(random.choice(choices) for _ in range(length)) ModelBuilder.register_type(str, next_str) manager1 = ModelBuilder.build_sync(Manager) self.assertTrue(all(char_ in choices for char_ in manager1.name)) poster1 = ModelBuilder.build_sync(Poster) self.assertTrue(all(char_ in choices for char_ in poster1.content)) ModelBuilder.unregister_type(str) manager2 = ModelBuilder.build_sync(Manager) self.assertTrue(all(char_ not in choices for char_ in manager2.name)) poster2 = ModelBuilder.build_sync(Poster) self.assertTrue(all(char_ not in choices for char_ in poster2.content))

The scenario is as follows: Manager1 is a locally hired individual, while Manager2 is dispatched from the UK. Both are working on the poster using their native languages.

Finally, I realized I had overlooked the magic behavior of Python's name mangling rules. For instance:

>>> class ModelBuilder: ... __OTHER_MAPPER = {} ... >>> ModelBuilder.__OTHER_MAPPER Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: type object 'ModelBuilder' has no attribute '__OTHER_MAPPER' >>> ModelBuilder._ModelBuilder__OTHER_MAPPER {}

As a result, the previous test code might be a bit off. I need to use the following code for the setup and teardown phases for each test:

def setUp(self) -> None: ModelBuilder._ModelBuilder__OTHER_MAPPER.clear() # type: ignore def tearDown(self) -> None: ModelBuilder._ModelBuilder__OTHER_MAPPER.clear() # type: ignore

Thanks for explaining the rationale behind it - it makes sense.

If I was to completely redesign ModelBuilder, I probably wouldn't have class methods. Instead of:

await ModelBuilder.build(MyTable)

I would have:

await ModelBuilder(some_option=True).build(MyTable)

So we can configure ModelBuilder's behaviour easier. For registering types we could have:

custom_builder = ModelBuilder(types={...}) await custom_builder.build(MyTable)

We could allow the types to be passed in via the build method instead:

await ModelBuilder.build(MyTable, types={...})

If register and unregister work globally, there are pros and cons. The main pro is you only need to set it up once (e.g. in a session fixture of Pytest). But if you were to somehow run your tests in parallel, it might be problematic.

What do you think?

Thank you for sharing your thoughts with me. Personally, I am inclined towards the instance method approach. However, implementing this change might break the current interface. I propose a three-stage transition plan:

First stage: Utilize the concept of descriptors to distinguish between calls from the class or instance. Initially, we move the current implementation to the class branch to maintain user experience. Simultaneously, we start implementing the new concept in the instance branch, issuing an experimental warning.

Second stage: If the new concept gains appreciation from users or developers, we add a deprecated warning to the class branch.

Third stage: Remove the class branch and clean up the code to ensure all methods are instance methods by the end.

During the first two stages, we'll keep the class branch unchanged and encourage users to try out the new syntax and the new features. If we reach the third stage, users who prefer the class branch might need to adjust their habits from using await ModelBuilder.build(...) to await ModelBuilder().build() or ModelBuilder.build_sync(...) to ModelBuilder().build_sync(...).

The concept of descriptors is relatively straightforward, but it can sometimes feel too magical to grasp. I often need a refresher before coding if I haven't touched it for a long time. Fortunately, we don't need the complex __get__ and __set__ logic for the data descriptor. A simple non-data descriptor should suffice for our use case. With the help of this post, I've drafted a concept code as follows:

import asyncio import inspect import typing as t from concurrent.futures import ThreadPoolExecutor def run_sync(coroutine: t.Coroutine): try: # We try this first, as in most situations this will work. return asyncio.run(coroutine) except RuntimeError: # An event loop already exists. with ThreadPoolExecutor(max_workers=1) as executor: future = executor.submit(asyncio.run, coroutine) return future.result() class dichotomy: def __init__(self, f): self.f = f def __get__(self, instance, owner): cls_or_inst = instance if instance is not None else owner if inspect.iscoroutine(self.f): async def newfunc(*args, **kwargs): return await self.f(cls_or_inst, *args, **kwargs) else: def newfunc(*args, **kwargs): return self.f(cls_or_inst, *args, **kwargs) return newfunc class ModelBuilder: def __init__(self, *args, **kwargs): self._types = "..." # Some information for instance method @dichotomy async def build(self_or_cls, *args, **kwargs): if inspect.isclass(self_or_cls): print("called as a class method from build") cls = self_or_cls await cls._build() else: print("called as an instance method from build") self = self_or_cls await self._build() @dichotomy def build_sync(self_or_cls, *args, **kwargs): return run_sync(self_or_cls.build()) @dichotomy async def _build(self_or_cls, *args, **kwargs): if inspect.isclass(self_or_cls): print("called as a class method from _build", end="\n"*2) cls = self_or_cls # noqa: F841 # Current implementation remains here. else: print("called as an instance method from _build") self = self_or_cls # Some information can be retrieved. print(f'{self._types=}', end="\n"*2) # Our new logics async def main(): print('Async ModelBuilder.build: ') await ModelBuilder.build() print('Async ModelBuilder().build: ') await ModelBuilder().build() print('Sync ModelBuilder.build: ') ModelBuilder.build_sync() print('Sync ModelBuilder().build: ') ModelBuilder().build_sync() if __name__ == '__main__': asyncio.run(main())

Async ModelBuilder.build: called as a class method from build called as a class method from _build Async ModelBuilder().build: called as an instance method from build called as an instance method from _build self._types='...' Sync ModelBuilder.build: called as a class method from build called as a class method from _build Sync ModelBuilder().build: called as an instance method from build called as an instance method from _build self._types='...'

Finally, I agree that making register and unregister work globally could make it challenging to verify test results in parallel scenarios. I might lean towards using instance methods for the registering issue again.

These are just rough ideas that came to mind. I'm open to further discussions and refinements.

Using descriptors is an interesting idea. I've used them sparingly before - as you say, they're very powerful, but can be confusing.

There's a lot of really good ideas in this PR, and I don't want to bog things down. I wonder if we could add this in a subsequent PR.

Certainly! Here are some options to consider for closing this PR:

Closing the PR without merging any changes.

Keeping the current code as is, while implementing the next_decimal functionality and updating related code.

Utilizing the latest commit of this PR while removing the option for users to register custom types.

Merging the PR with its latest commit.

Considering any other suggestions or alternatives.

I'm open to any of these choices. @dantownsend , what are your thoughts on this matter?

@jrycw Sorry for the delay on this - I haven't forgotten about it, I'm just trying to finish off a couple of PRs.

I'm interested to know your thoughts on this:

#978

If you think it's a good idea or not.

@dantownsend no worries at all. It's great to see the project progressing on various fronts. I'll make an effort to review it and share any opinions or feedback I may have.

piccolo/testing/random_builder.py

refactor ModelBuilder and RandomBuilder

358a7d7

dantownsend reviewed Mar 22, 2024

View reviewed changes

Another iteration of refactor

d6df247

third refactor

ad763c3

jrycw commented Mar 23, 2024

View reviewed changes

dantownsend reviewed Mar 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `ModelBuilder` and `RandomBuilder` #971

Refactor `ModelBuilder` and `RandomBuilder` #971

jrycw commented Mar 22, 2024

codecov-commenter commented Mar 22, 2024 •

edited

Loading

dantownsend left a comment

dantownsend Mar 22, 2024

jrycw Mar 22, 2024

dantownsend Mar 22, 2024

jrycw Mar 23, 2024

dantownsend Mar 22, 2024

jrycw Mar 22, 2024

dantownsend Mar 22, 2024

jrycw commented Mar 22, 2024

dantownsend commented Mar 22, 2024

jrycw commented Mar 22, 2024

jrycw commented Mar 23, 2024 •

edited

Loading

jrycw Mar 23, 2024

dantownsend Mar 23, 2024

jrycw Mar 24, 2024

jrycw Mar 24, 2024

dantownsend Mar 23, 2024

jrycw Mar 24, 2024

jrycw Mar 24, 2024

sinisaos Mar 24, 2024

jrycw Mar 25, 2024

dantownsend Mar 25, 2024 •

edited

Loading

dantownsend Mar 23, 2024

jrycw Mar 24, 2024

dantownsend Mar 25, 2024

jrycw Mar 26, 2024

dantownsend Mar 29, 2024

jrycw Mar 31, 2024 •

edited

Loading

dantownsend Apr 6, 2024

jrycw Apr 7, 2024

Refactor ModelBuilder and RandomBuilder #971

Are you sure you want to change the base?

Refactor ModelBuilder and RandomBuilder #971

Conversation

jrycw commented Mar 22, 2024

codecov-commenter commented Mar 22, 2024 • edited Loading

Codecov Report

dantownsend left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrycw commented Mar 22, 2024

dantownsend commented Mar 22, 2024

jrycw commented Mar 22, 2024

jrycw commented Mar 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dantownsend Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrycw Mar 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Refactor `ModelBuilder` and `RandomBuilder` #971

Refactor `ModelBuilder` and `RandomBuilder` #971

codecov-commenter commented Mar 22, 2024 •

edited

Loading

jrycw commented Mar 23, 2024 •

edited

Loading

dantownsend Mar 25, 2024 •

edited

Loading

jrycw Mar 31, 2024 •

edited

Loading