Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching for canonicalised JSON #62

Merged

Conversation

Stranger6667
Copy link
Contributor

Ref: #61 (comment)

A prettified version of that draft with your suggestion applied. I will evaluate its impact on the recursive schemas branch and will post it here.

@Stranger6667 Stranger6667 force-pushed the dd/canonicalised-json-cache branch from f7feab9 to 6b3f0b0 Compare August 25, 2020 13:48
@Stranger6667

This comment has been minimized.

@Zac-HD

This comment has been minimized.

@Stranger6667
Copy link
Contributor Author

Here is the performance impact on the "recursive" branch. Config:

  • max_examples=20
  • derandomize=True
  • suppress_health_check=[HealthCheck.too_slow, HealthCheck.filter_too_much, HealthCheck.data_too_large]

Schema 1. One of the simplest recursive schemas:

SCHEMA = {
    "properties": {"foo": {"$ref": "#"}},
    "additionalProperties": False,
    "type": "object",
}
Seed Current Cached Change
1 142.8 ms ± 2.0 ms 140.6 ms ± 0.9 ms -1%
2 143.4 ms ± 1.1 ms 143.1 ms ± 1.2 ms -0.2%
3 139.6 ms ± 1.0 ms 139.4 ms ± 0.9 ms -0.1%

Here, the impact is almost not visible. But there are no negative observations at least.

Schema 2. A tree-like structure with an array of child nodes:

{
    "definitions": {
        "Node": {
            "type": "object",
            "properties": {
                "children": {
                    "type": "array",
                    "items": {"$ref": "#/definitions/Node"},
                    "maxItems": 2,
                }
            },
            "required": ["children"],
            "additionalProperties": False,
        },
    },
    "$ref": "#/definitions/Node",
}
Seed Current Cached Change
1 17.174 s ± 0.422 s 9.647 s ± 0.054 s -43%
2 4.429 s ± 0.067 s 2.488 s ± 0.014 s -43%
3 5.494 s ± 0.098 s 3.103 s ± 0.014 s -43%

Here it is much more visible with the average almost 2x improvement.

Schema 3. A subset of Open API schema (which now works locally - I push changes to that branch soon):

SCHEMA = {
    "type": "object",
    "required": ["paths"],
    "properties": {"paths": {"$ref": "#/definitions/Paths"}},
    "additionalProperties": False,
    "definitions": {
        "Schema": {
            "type": "object",
            "properties": {"items": {"$ref": "#/definitions/Schema"}},
            "additionalProperties": False,
        },
        "MediaType": {
            "type": "object",
            "properties": {"schema": {"$ref": "#/definitions/Schema"}},
            "patternProperties": {"^x-": {}},
            "additionalProperties": False,
        },
        "Paths": {
            "type": "object",
            "patternProperties": {
                "^\\/": {"$ref": "#/definitions/PathItem"},
                "^x-": {},
            },
            "additionalProperties": False,
        },
        "PathItem": {
            "type": "object",
            "properties": {
                "parameters": {
                    "type": "array",
                    "items": {"$ref": "#/definitions/Parameter"},
                    "uniqueItems": True,
                },
            },
            "patternProperties": {
                "^(get|put|post|delete|options|head|patch|trace)$": {
                    "$ref": "#/definitions/Operation"
                },
                "^x-": {},
            },
            "additionalProperties": False,
        },
        "Operation": {
            "type": "object",
            "required": ["responses"],
            "properties": {
                "parameters": {
                    "type": "array",
                    "items": {"$ref": "#/definitions/Parameter"},
                    "uniqueItems": True,
                },
            },
            "additionalProperties": False,
        },
        "Parameter": {
            "type": "object",
            "properties": {
                "schema": {"$ref": "#/definitions/Schema"},
                "content": {"type": "object", "minProperties": 1, "maxProperties": 1},
            },
            "additionalProperties": False,
        },
    },
}
Seed Current Cached Change
1 2.089 s ± 0.008 s 1.482 s ± 0.041 s -29%
2 4.399 s ± 0.044 s 3.161 s ± 0.045 s -28%
3 1.732 s ± 0.009 s 1.228 s ± 0.035 s -29%

Slightly weaker improvement, but still visible.

I don't know how this change will affect more complex schemas once the support for them will be implemented, but I suspect somehow moderate improvement from almost nothing to ~50%. Will try to make more experiments

@Stranger6667

This comment has been minimized.

@Stranger6667 Stranger6667 force-pushed the dd/canonicalised-json-cache branch from 6b3f0b0 to b40f420 Compare August 25, 2020 15:05
@Stranger6667
Copy link
Contributor Author

And I will add some more measurement for non-recursive schemas, but at first glance, this change gives some minor improvements there as well.

@Stranger6667
Copy link
Contributor Author

I tried the following non-recursive schema:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "type": "array",
    "items": [
        {"type": "number", "maximum": 10, "exclusiveMaximum": True,},
        {"type": "string", "enum": ["hello", "world"],},
        {
            "type": "array",
            "minItems": 1,
            "maxItems": 3,
            "items": [{"type": "number"}, {"type": "string"}, {"type": "boolean"},],
        },
        {
            "type": "object",
            "required": ["a", "b"],
            "minProperties": 3,
            "properties": {
                "a": {"type": ["null", "string"]},
                "b": {"type": ["null", "string"]},
                "c": {"type": ["null", "string"], "default": "abc"},
            },
            "additionalProperties": {"type": "string"},
        },
        {"not": {"type": ["null"]}},
        {
            "oneOf": [
                {"type": "number", "multipleOf": 3},
                {"type": "number", "multipleOf": 5},
            ]
        },
    ],
}

It includes a lot of different keywords, so I see it like some average schema. The results are:

No cache: 447.7 ms ± 1.3 ms
Cache: 439.5 ms ± 2.2 ms

To me, the results seem near the error threshold.

The cache statistic - CacheInfo(hits=276, misses=27, maxsize=128, currsize=27)
I could imagine a slowdown on some schemas where there will be no hits, but not sure what is the fraction of real-world use cases :(

For Schema 2 from my previous message, the statistic is:

CacheInfo(hits=34162, misses=891, maxsize=128, currsize=128)

and increasing the cache size up to 1024 gives a bit more improvements - up to -49.5% with the following cache statistic:

CacheInfo(hits=34745, misses=308, maxsize=512, currsize=308)

So, I am wondering if this change brings enough. For recursive schemas, it seems to give nice improvements, but I am not sure about the corner cases where it can make the performance worse. I probably could keep this PR open until the point when the recursive schemas support is ready, and then there will be more evidence to check - maybe there is something else that slows down the CI jobs that much.

What do you think?

@Zac-HD
Copy link
Member

Zac-HD commented Aug 26, 2020

Let's keep it open for now.

To be clear, I intend to merge this pretty much as-is unless we learn something surprisingly dramatic later on - a 50% speedup is a big deal! However, I also want to take some time next week to play around with it for myself and maybe split out the serialisation logic to a separate module for clarity.

Thanks again!

@Stranger6667 Stranger6667 force-pushed the dd/canonicalised-json-cache branch from b40f420 to 1f2c697 Compare August 27, 2020 13:06
@Stranger6667 Stranger6667 force-pushed the dd/canonicalised-json-cache branch from 1f2c697 to 46de172 Compare August 27, 2020 13:06
@Zac-HD
Copy link
Member

Zac-HD commented Sep 9, 2020

Just checking in - I've been pretty busy lately with PyCon Australia and HypoFuzz, but wanted to let you know that haven't forgotten this and it's up near the top of my open source todo list 🙂

@Stranger6667
Copy link
Contributor Author

Cool! :) Thank you for these links and for checking in; much appreciated :)

@Zac-HD Zac-HD merged commit cc9fcb1 into python-jsonschema:master Sep 10, 2020
@Stranger6667 Stranger6667 deleted the dd/canonicalised-json-cache branch September 10, 2020 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants