Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing source of QueriesDisabledError with nested serializers #50

Open
johncmacy opened this issue Aug 18, 2022 · 3 comments
Open

Tracing source of QueriesDisabledError with nested serializers #50

johncmacy opened this issue Aug 18, 2022 · 3 comments

Comments

@johncmacy
Copy link

The QueriesDisabledViewMixin is great at preventing queries during serialization. Once it's raised, however, I've had difficulty tracing the root cause - mostly when working with deeply nested serializers that have the same model serialized in multiple places.

To demonstrate, I extended your Pizza/Toppings example to include several models with multiple foreign keys to the User model:

Models
class Topping(models.Model):
    name = models.CharField(max_length=100)


class Pizza(models.Model):
    name = models.CharField(max_length=100)
    toppings = models.ManyToManyField(Topping)


class Order(models.Model):
    ordered_at = models.DateTimeField(auto_now_add=True)
    pizzas = models.ManyToManyField(Pizza, through='OrderPizza', related_name='orders')
    customer = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.PROTECT, related_name='orders_placed')
    received_by = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.PROTECT, related_name='orders_received')
    delivered_by = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.PROTECT, related_name='orders_delivered')


class OrderPizza(models.Model):
    order = models.ForeignKey(Order, on_delete=models.CASCADE, related_name='order_pizzas')
    pizza = models.ForeignKey(Pizza, on_delete=models.CASCADE, related_name='order_pizzas')
    baked_by = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.PROTECT, related_name='order_pizzas_baked')

    class SizeChoices(models.IntegerChoices):
        SMALL = 1
        MEDIUM = 2
        LARGE = 3
        FAMILY = 4

    size = models.IntegerField(choices=SizeChoices.choices)
Serializers and Viewsets
class OrderPizzaSerializer(PaperTrailMixin, serializers.ModelSerializer):
    pizza = pizza.Serializers.Summary()
    baked_by = user.Serializers.Summary()

    class Meta:
        model = OrderPizza
        fields = [
            'id',
            'pizza',
            'baked_by',
            'size',
        ]

class OrderSerializer(PaperTrailMixin, serializers.ModelSerializer):
    customer = user.Serializers.Summary()
    received_by = user.Serializers.Summary()
    delivered_by = user.Serializers.Summary()

    order_pizzas = OrderPizzaSerializer(many=True)

    class Meta:
        model = Order
        fields = [
            'id',
            'ordered_at',
            'customer',
            'received_by',
            'delivered_by',
            'order_pizzas',
        ]

class OrderViewset(QueriesDisabledViewMixin, viewsets.ModelViewSet):
    serializer_class = OrderSerializer
    queryset = Order.objects.all()

    def get_queryset(self):
        return super().get_queryset()\
            .select_related(
                'customer', 
                'received_by', 
                'delivered_by',
            )\
            .prefetch_related(
                Prefetch(
                    lookup='order_pizzas',
                    queryset=OrderPizza.objects.select_related(
                        'pizza', 
                        # 'baked_by',
                    )
                ),
            )

When I make a request to /api/v1/orders/, since I "forgot" to prefetch OrderPizza.baked_by, it raises a QueriesDisabledError. However, it just tells me that it's trying to query a User. Without digging into it, it's not readily apparent which field on which serializer is causing the issue:

QueriesDisabledError at /api/v1/orders/
SELECT "users_user"."id", "users_user"."password", "users_user"."last_login", "users_user"."is_superuser", "users_user"."username", "users_user"."first_name", "users_user"."last_name", "users_user"."email", "users_user"."is_staff", "users_user"."is_active", "users_user"."date_joined" FROM "users_user" WHERE "users_user"."id" = %s LIMIT 21

To solve this, I put together a "PaperTrailMixin" for each serializer:

PaperTrailMixin
import logging
from rest_framework.serializers import Serializer
from zen_queries import QueriesDisabledError
from collections import OrderedDict
from rest_framework.relations import PKOnlyObject
from rest_framework.fields import SkipField

INDENT_STR = '.   '

class PaperTrailMixin(Serializer):
    debug_indent = 0

    def to_representation(self, instance):
        """
        Copied from Serializer.to_representation()
        Added code between ### > ... < ###:
            Wrapped each field with try...except QueriesDisabledError
            Added logging
        """

        ret = OrderedDict()
        fields = self._readable_fields

        self.logger_prefix = f'{INDENT_STR * self.debug_indent}'
        logging.debug(f'{self.logger_prefix}{self.__class__.__name__} | {instance.__class__.__name__} | {instance}')

        for field in fields:
            ### >
            field.debug_indent = field.parent.debug_indent + 1
            if hasattr(field, 'child'): field.child.debug_indent = field.debug_indent
            field.logger_prefix = f'{INDENT_STR * field.debug_indent}{f"{field.field_name} --> "}'

            try:
            ### <

                try:
                    attribute = field.get_attribute(instance)
                except SkipField:
                    continue

                ### >
                logging.debug(f'{field.logger_prefix}{attribute.__class__.__name__} | {field.__class__.__name__} | {attribute}')
                ### <

                check_for_none = attribute.pk if isinstance(attribute, PKOnlyObject) else attribute
                if check_for_none is None:
                    ret[field.field_name] = None
                else:
                    ret[field.field_name] = field.to_representation(attribute)

            ### >
            except QueriesDisabledError as e:
                logging.debug(f'{field.logger_prefix}QueriesDisabledError')
                raise e
            ### <

        return ret

When a QueriesDisabledError is raised, it logs the following:

2022-08-18 08:26:05,065 OrderSerializer | Order | 1
2022-08-18 08:26:05,067 .   id --> int | IntegerField | 1
2022-08-18 08:26:05,067 .   ordered_at --> datetime | DateTimeField | 2022-08-17 18:33:23.727706+00:00
2022-08-18 08:26:05,068 .   customer --> User | Summary | rockybalboa
2022-08-18 08:26:05,068 .   Summary | User | rockybalboa
2022-08-18 08:26:05,068 .   .   id --> int | IntegerField | 4
2022-08-18 08:26:05,069 .   .   get_full_name --> str | ReadOnlyField | Rocky Balboa
2022-08-18 08:26:05,069 .   received_by --> User | Summary | hulkhogan
2022-08-18 08:26:05,069 .   Summary | User | hulkhogan
2022-08-18 08:26:05,070 .   .   id --> int | IntegerField | 3
2022-08-18 08:26:05,070 .   .   get_full_name --> str | ReadOnlyField | Hulk Hogan
2022-08-18 08:26:05,071 .   delivered_by --> User | Summary | maverick
2022-08-18 08:26:05,072 .   Summary | User | maverick
2022-08-18 08:26:05,072 .   .   id --> int | IntegerField | 5
2022-08-18 08:26:05,072 .   .   get_full_name --> str | ReadOnlyField | Maverick Mitchell
2022-08-18 08:26:05,073 .   order_pizzas --> RelatedManager | ListSerializer | core.OrderPizza.None
2022-08-18 08:26:05,073 .   OrderPizzaSerializer | OrderPizza | 1
2022-08-18 08:26:05,074 .   .   id --> int | IntegerField | 1
2022-08-18 08:26:05,074 .   .   pizza --> Pizza | Summary | Pepperoni
2022-08-18 08:26:05,075 .   .   Summary | Pizza | Pepperoni
2022-08-18 08:26:05,076 .   .   .   id --> int | IntegerField | 2
2022-08-18 08:26:05,076 .   .   .   name --> str | CharField | Pepperoni
2022-08-18 08:26:05,078 .   .   baked_by --> QueriesDisabledError
2022-08-18 08:26:05,078 .   order_pizzas --> QueriesDisabledError
2022-08-18 08:26:05,229 Internal Server Error: /api/v1/orders/

With this it's much easier to pinpoint where I need to add prefetches or select_related's to solve the error.

  1. Is there a mechanism for this already, that I've missed somehow?
  2. If not, would you be interested in adding something like this to the package?

Thanks!

@j4mie
Copy link
Member

j4mie commented Sep 7, 2022

Hello! Sorry for the slow reply on this and thanks for opening the issue!

This is very clever and definitely solves a problem that I've experienced (I would normally fix it by commenting out serializer fields and then gradually adding them back until the error is raised).

I'm not totally sure whether this should live inside zen-queries itself though. It's probably not something you'd want to be default behaviour because it's monkeying around with the internals of the serializer so much. It also feels maybe a bit out of scope: zen-queries tries to be quite minimal and this is quite a lot of complicated code to maintain.

I'd like to preserve it somehow though. A few ideas, in no particular order:

  1. Turn on Discussions on the repo, convert this issue to a discussion and mark it as an accepted answer.
  2. Put it in the repo wiki.
  3. Put it in the README
  4. Put it in a separate pip-installable library.

Open to other suggestions and persuasions!

Thanks again

@johncmacy
Copy link
Author

Thanks for the reply - and no worries!

I would normally fix it by commenting out serializer fields and then gradually adding them back until the error is raised

I've used this trick as well, and in my opinion this offers a significant time savings over that method, especially when you're dealing with nested serializers.

it's monkeying around with the internals of the serializer so much

Completely agree. I proposed this to demonstrate the result I'm looking for, not a solution that's ready to merge. It is a bit of a hack, since I had to copy Serializer.to_representation in order to add code inside the for field in fields loop. I was hoping better minds could come up with a more elegant solution, and one that doesn't bake in DRF's to_representation logic.

probably not something you'd want to be default behaviour

I agree, and since posting the issue I've actually added the following at the top:

def to_representation(self, instance):
    if not (settings.DEBUG and self.debug):
        return super().to_representation(instance)

When I get a QueriesDisabledError, I add debug = True to the root serializer, re-send the request, and inspect the logs. Once resolved, I remove the line and it resumes normal serialization. This way it only runs in development, and only as needed to debug a specific serializer.

I'm not totally sure whether this should live inside zen-queries itself though.

I can see where you're coming from, especially in trying to keep it as minimal as possible. However, it would be more convenient, and logical in my opinion, for it to be part of zen-queries, since developers will only ever run into QueriesDisabledErrors if they're using zen-queries (I think?). At a minimum, it would be helpful to add some documentation to the README for how to troubleshoot when dealing with nested serializers.

I was initially going to say that if it were a separate library, it could be installed as a dev dependency. However, I'm pretty sure the way it works now, the mixin has to be included in all serializers or else you don't get logging at lower levels of nesting. If we could get it to the point where it did not have to be included on every serializer, then it could remain a dev dependency, and opted-in by adding the mixin to the root serializer when the error is raised. I like that better than adding the debug = True line.

I'm open for whatever you think is best moving forward.

@johncmacy
Copy link
Author

Here's an alternate approach. This overrides __init__ to recursively wrap all descendant fields' to_representation methods with a function that logs the debug message for that field. It doesn't override DRF serialization logic. And it can be used selectively when errors are spotted, without requiring the mixin on all serializers beforehand.

Usage

Normal:

class OrderSerializer(serializers.ModelSerializer):

After spotting a QueriesDisabledError:

class OrderSerializer(PaperTrailMixin, serializers.ModelSerializer):

Problem solved, switch back to normal:

class OrderSerializer(serializers.ModelSerializer):

Here's the new mixin, about 30 lines:

PaperTrailMixin
import logging

class PaperTrailMixin:
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        def upgrade_field_and_children_to_repr(field_name, field, indent_level):
            indent = '.    ' * indent_level

            # for list serializers (many=True), override the child's to_repr instead:
            _field = field.child if hasattr(field, 'child') else field

            def wrap_to_representation(func):
                def wrapper(instance):
                    logging.debug(f'{indent}{f"{field_name} --> "}{_field.__class__.__name__} | {instance.__class__.__name__} | {instance}')

                    try:
                        func(instance)

                    except Exception as e:
                        logging.debug(f'{indent} ===== {e.__class__.__name__} - {field_name} =====')
                        raise e

                return wrapper

            _field.to_representation = wrap_to_representation(_field.to_representation)

            if hasattr(_field, 'fields'):
                for child_field_name, child_field in _field.fields.items():
                    upgrade_field_and_children_to_repr(child_field_name, child_field, indent_level + 1)

        upgrade_field_and_children_to_repr('ROOT', self, 0)

And the output is similar:

Logs
ROOT --> OrderSerializer | Order | 1
.    id --> IntegerField | int | 1
.    ordered_at --> DateTimeField | datetime | 2022-08-17 18:33:23.727706+00:00
.    customer --> Summary | User | rockybalboa
.    .    id --> IntegerField | int | 4
.    .    get_full_name --> ReadOnlyField | str | Rocky Balboa
.    received_by --> Summary | User | hulkhogan
.    .    id --> IntegerField | int | 3
.    .    get_full_name --> ReadOnlyField | str | Hulk Hogan
.    delivered_by --> Summary | User | maverick
.    .    id --> IntegerField | int | 5
.    .    get_full_name --> ReadOnlyField | str | Maverick Mitchell
.    order_pizzas --> OrderPizzaSerializer | OrderPizza | 1
.    .    id --> IntegerField | int | 1
.    .    pizza --> Summary | Pizza | Pepperoni
.    .    .    id --> IntegerField | int | 2
.    .    .    name --> CharField | str | Pepperoni
.     ===== QueriesDisabledError - order_pizzas =====
 ===== QueriesDisabledError - ROOT =====
Internal Server Error: /api/v1/orders/
Traceback (most recent call last):
  File "...\env\lib\site-packages\django\db\models\fields\related_descriptors.py", line 173, in __get__
    rel_obj = self.field.get_cached_value(instance)
  File "...\env\lib\site-packages\django\db\models\fields\mixins.py", line 15, in get_cached_value
    return instance._state.fields_cache[cache_name]
KeyError: 'baked_by'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  ...
    raise QueriesDisabledError(sql)
zen_queries.decorators.QueriesDisabledError: SELECT * FROM "users_user" ...

It's slightly less intuitive than the other approach in that it doesn't clearly log out the field that caused the error. You can pretty easily get there, if you know how to read it:

  • It occurred while serializing order_pizzas using the OrderPizzaSerializer
  • id and pizza were logged, so by comparing to the serializer fields, you can infer that it happened while trying to serialize baked_by
  • If you scroll down, the traceback indicates KeyError: baked_by

But I think it's better in every other way, so I could live with it if that's the only downside.

My vote would be to include this in zen-queries, and not require a separate package to be installed. Probably add it to rest_framework.py and rename it something better, maybe DebugDisabledQueriesMixin. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants