Large schema cache reloads cause 100% CPU spike when schema cache metadata is accessed. #3046

colophonemes · 2023-11-10T15:04:40Z

Environment

PostgreSQL version: 13.10 (also tested 13.11)
PostgREST version: Known issue since v9.0, recently tested in 11.1.0 and 11.2.1
Operating system: MacOS, Docker (ECS)

Description of issue

Our production instance of PostgREST exhibits the following behaviour:

Schema cache is reloaded (either via app restart or NOTIFY pgrst reload)
Things work as normal for a few minutes
Something actually needs to look at the updated schema cache (the most reliable way to reproduce is to load the OpenAPI spec from the / endpoint, but the behaviour seems to eventually happen regardless)
PostgREST starts spinning CPU, pegging at 100% for about five mins
- If you triggered the spike by attempting to load the OpenAPI schema endpoint, we at first get a partial response (always 135,063 bytes in my testing) that loads within a few seconds, with a truncated form of the JSON (it's not a valid JSON object that contains fewer entries, but a longer response that has been truncated). Eventually this response will time out
- Subsequent reloads of the OpenAPI schema endpoint trigger a CPU spike of similar duration
- There's usually two spikes (one per OpenAPI refresh), and then by the end of the second one, the OpenAPI schema request finishes buffering and the full JSON loads; occasionally there'll be a third spike that's a bit shorter than the previous two, and the OpenAPI schema resolves after that one.
Once the OpenAPI schema request has loaded fully, subsequent requests return ~immediately, and there are no more CPU spikes, until such time as the schema cache reloads again.

Here's me reproducing the issue in a local Docker container, connected to our production DB. The time it takes for the CPU spike to calm down is shorter, presumably because my laptop is beefier than the ECS instances we run our production PostgREST on.

We've observed this issue since at least v9.0.0 (though possibly it was present earlier), through to the present day. I've just run through the testing steps above on both v11.1.0 and v11.2.1, and the behaviour is there in both.

Possibly an interaction with TimescaleDB?

My guess about why this happens in production but not dev is that we use TimescaleDB for a number of our tables that contain time-series data. This means that the number of tables in our schema changes dynamically as more data is added to parent tables, and Timescale chunks these into child tables. The child tables are stored on a _timescaledb_internal schema, e.g. _timescaledb_internal._hyper_1_12345_chunk. Some of our larger tables have upwards of 500 of these chunks attached.

Formatting and running the tablesSqlQuery from PostgREST/SchemaCache.hs results in ~350 entries against a dev database, and nearly 13,000 entries against prod. If I add AND n.nspname !~ 'timescale' to the end of that query then the prod query drops back to around 300 entries.

So, our guess is that our production PostgREST instances are choking on this massive amount of data when processing the schema cache (looking at the network graph in the image above, it looks like it's transferring about 10MB over the wire to reload the schema).

Possible mitigations

I guess that there are at least three possible mitigations that I can think of here:

Exclude child tables from the schema cache (maybe gated under a config var; not sure if this has implications for resource embedding/FK detection)
Special-case _timescaledb_internal and other Timescale schemas so they don't appear in the tables query (seems a bit gross, but maybe routes around needing to handle edge cases for the more general case of child tables)
Add a config variable that allows us to exclude specified schemas from the schema cache (probably safer + more general)

Obviously all these mitigations rely on my being correct that the root cause of the issue is the large number of child tables present in the tables query.

That said, I'm wondering if there is in fact some perf bug here? (13k rows / 10MB of data is a lot relative to most schemas, but also seems like something that should maybe be processable. But I don't have any idea about the how the combinatorics of adding additional tables scales here.)

The text was updated successfully, but these errors were encountered:

wolfgangwalther · 2023-11-10T16:42:40Z

If I add AND n.nspname !~ 'timescale' to the end of that query then the prod query drops back to around 300 entries.

Is this timescale schema part of your search path, i.e. via db-extra-search-path? This would cause this schema to be scanned for the schema cache. If it's not in the search_path it should normally not be scanned at all.

colophonemes · 2023-11-10T17:08:16Z

Is this timescale schema part of your search path, i.e. via db-extra-search-path?

Sorry, forgot to mention that in the original post – no, we only have api in db-schemas, and then the schemas that api is an overlay on in db-extra-search-path (app_schema_1, app_schema_2 etc).

The tables on e.g. app_schema_1 etc do themselves have TimescaleDB child tables, but these aren't hosted in app_schema_1, but live on their own _timescaledb_internal schema.

user@db_host=> \d+ app_schema_1.some_table

                                       Table "app_schema_1.some_table"
┌──────────┬──────────────────────────┬───────────┬──────────┬─────────┬─────────┬──────────────┬─────────────┐
│  Column  │           Type           │ Collation │ Nullable │ Default │ Storage │ Stats target │ Description │
├──────────┼──────────────────────────┼───────────┼──────────┼─────────┼─────────┼──────────────┼─────────────┤
│ ts       │ timestamp with time zone │           │ not null │         │ plain   │              │             │
│ thing_id │ bigint                   │           │ not null │         │ plain   │              │             │
│ value    │ numeric                  │           │ not null │         │ main    │              │             │
└──────────┴──────────────────────────┴───────────┴──────────┴─────────┴─────────┴──────────────┴─────────────┘
Indexes:
    "some_table_thing_id_ts_idx" btree (thing_id, ts DESC)
Check constraints:
    "positive_value" CHECK (value > 0::numeric)
Triggers:
    ts_insert_blocker BEFORE INSERT ON some_table FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_1_chunk,
              _timescaledb_internal._hyper_1_2_chunk,
              _timescaledb_internal._hyper_1_3_chunk,
              _timescaledb_internal._hyper_1_4_chunk,
              _timescaledb_internal._hyper_1_5_chunk,
              _timescaledb_internal._hyper_1_6_chunk,
              _timescaledb_internal._hyper_1_7_chunk,
              ... etc etc
Access method: heap

wolfgangwalther · 2023-11-10T17:15:19Z

Ah right, my bad. We only filter for extra search path schemas in the allViewsKeysDependencies query - not in the tablesSqlQuery:

postgrest/src/PostgREST/SchemaCache.hs

Line 952 in f10d413

where c.relkind in ('v', 'm') and n.nspname = ANY($1 || $2)

vs.

postgrest/src/PostgREST/SchemaCache.hs

Line 784 in f10d413

AND n.nspname NOT IN ('pg_catalog', 'information_schema') |] <>

This is because we might detect base tables in those views that are outside the search path (probably the case most of the time). IIRC we discussed a potential for future optimization by just querying the tables detected via the views query in the tables query. Seems like you have hit that limitation right there.

colophonemes · 2024-02-20T14:27:08Z

@wolfgangwalther Hey - just wanted to check in to see if this is something that is likely to be worked on in an upcoming release of PostgREST? If not, how feasible do you think it would be for me as a relatively inexperienced Haskell developer to dive in an try to submit a PR fixing this (and if this is the route we decide to go, do you have a preference for how a fix is implemented)?

wolfgangwalther · 2024-02-20T14:41:06Z

just wanted to check in to see if this is something that is likely to be worked on in an upcoming release of PostgREST?

No plans from me regarding this in the near future. Too many other things on my list I'd like to do.

If not, how feasible do you think it would be for me as a relatively inexperienced Haskell developer to dive in

I would say in general the area around the schema cache should be easier to get started with than other areas - because big chunks here are actually SQL, not haskell. So if you know SQL well, that should make it easier.

if this is the route we decide to go, do you have a preference for how a fix is implemented?

I have not thought in detail about this. Reducing the size of the schema cache / improving the performance of the schema cache queries is always a good thing.

I'm not exactly sure how, but one way to do this could be to merge the allViewsKeysDependencies and tablesSqlQuery queries - so that we will never have to "query views first, then pass them to the tables query".

steve-chavez · 2024-02-20T23:31:12Z

Since #3213, we now log some schema cache stats:

20/Feb/2024:18:27:24 -0500: Schema cache queried in 12.7 milliseconds
20/Feb/2024:18:27:24 -0500: Schema cache loaded 264 Relations, 221 Relationships, 141 Functions, 15 Domain Representations, 45 Media Type Handlers

It would be interesting to know the metrics for your case @colophonemes.

If there's a large number of relations, perhaps we should exclude _timescaledb_internal from the schema cache?

colophonemes · 2024-02-21T16:30:01Z

@steve-chavez Heres the output from when I load using that build:

21/Feb/2024:16:23:22 +0000: Starting PostgREST 12.1 (1a141c1)...
21/Feb/2024:16:23:22 +0000: Attempting to connect to the database...
21/Feb/2024:16:23:22 +0000: Successfully connected to PostgreSQL 13.13 (Ubuntu 13.13-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
21/Feb/2024:16:23:22 +0000: Listening on port 3000
21/Feb/2024:16:23:24 +0000: Listening for notifications on the pgrst channel
21/Feb/2024:16:23:24 +0000: Config reloaded
21/Feb/2024:16:24:04 +0000: Schema cache queried in 640.9 milliseconds
21/Feb/2024:16:24:04 +0000: Schema cache loaded 99 Relations, 230 Relationships, 16 Functions, 0 Domain Representations, 4 Media Type Handlers

And here's some output from ps at 5s intervals while it's starting:

ELAPSED  %CPU %MEM
  00:01   2.0  0.2
  00:06   0.0  0.2
  00:11   1.6  0.3
  00:17   1.5  0.3
  00:22   1.1  0.4
  00:27   0.0  0.4
  00:32   0.0  0.4
  00:37   7.6  0.5
  00:42   1.3  0.5
# 21/Feb/2024:16:24:04 +0000: Schema cache queried in 640.9 milliseconds
  00:47 100.1  0.6
  00:52 100.1  0.6
  00:57  98.0  0.6
  01:02  98.2  0.6
  01:07 100.1  0.6
  01:12  99.1  0.6
  01:17 100.1  0.6
  01:22 100.1  0.6
  01:27 100.1  0.6
  01:32  99.1  0.6
  01:37 100.1  0.6
  01:42 100.1  0.6
  01:47  98.7  0.6
  01:52 100.1  0.6
  01:57  99.1  0.6
  02:02 100.1  0.6
  02:07 100.1  0.6
  02:12 100.1  0.6
  02:17  99.7  0.6
  02:22  98.9  0.6
  02:27  98.1  0.6
  02:32 100.1  0.6
  02:37 100.1  0.6
  02:42 100.1  0.6
  02:48 100.1  0.6
# 21/Feb/2024:16:24:04 +0000: Schema cache loaded 99 Relations, 230 Relationships, 16 Functions, 0 Domain Representations, 4 Media Type Handlers
  02:53   3.2  0.6
  02:58   0.0  0.6

steve-chavez · 2024-02-21T17:38:31Z

@colophonemes Aha. So that means the schema cache queries are not the problem since they're fast. The problem is the schema cache parsing. This was previously discussed on #2450 (comment).

wolfgangwalther · 2024-02-21T17:41:52Z

Aha. So that means the schema cache queries are not the problem since they're fast. The problem is the schema cache parsing.

While this is right, this does not necessarily mean that "optimizing the queries" as discussed above is wrong. IIRC, we fetch a lot more data for the schema cache - and then filter out the stuff we don't need in haskell code, right?

colophonemes · 2024-02-21T17:53:38Z

Yeah, I agree that it seems like it's the parsing rather than the DB query (indeed, I was able to extract and manually run the SQL queries myself and they were very fast), but I think I agree with @wolfgangwalther that if we could cut down the number of relations we're feeding into the parsing step, then parsing would go a lot faster.

If there's a large number of relations, perhaps we should exclude _timescaledb_internal from the schema cache?

Yeah, this seems like a straightforward solution, it could be a flag that's sort of the opposite of db-extra-search-path, giving a list of schemas that you want to exclude. But perhaps it's just better to go with the suggestion of cutting out more cruft from the tables queries directly? 🤷‍♂️

steve-chavez · 2024-02-21T18:53:33Z

@colophonemes Could you share your schema (DDL only) privately? (email on profile) Then I can work on improving the query or maybe the Haskell code.

steve-chavez · 2024-02-21T23:28:31Z

And here's some output from ps at 5s intervals while it's starting:

That time now should be clearer on main branch due to #3253

colophonemes · 2024-02-22T17:21:06Z

Could you share your schema (DDL only) privately? (email on profile) Then I can work on improving the query or maybe the Haskell code.

Unfortunately I don't think that I can that easily. It's not super-straightforward to dump out the Timescale stuff in a way that's restorable on your end. I'll try to prepare a toy schema that results in the same behaviour.

steve-chavez · 2024-02-28T18:40:28Z

Did some digging using the big_schema:

$ PGRST_DB_SCHEMAS="apflora" postgrest-with-postgresql-15 -f test/io/big_schema.sql postgrest-run

28/Feb/2024:13:33:24 -0500: Starting PostgREST 12.1 (160caaf)...
...
28/Feb/2024:13:33:24 -0500: Schema cache loaded 326 Relations, 305 Relationships, 6 Functions, 0 Domain Representations, 4 Media Type Handlers
28/Feb/2024:13:33:42 -0500: Schema cache loaded in 17141.6 milliseconds

I'm not exactly sure how, but one way to do this could be to merge the allViewsKeysDependencies and tablesSqlQuery queries - so that we will never have to "query views first, then pass them to the tables query".

The problem is not related to the tablesSqlQuery, which is fast. It's only isolated to the relationships

postgrest/src/PostgREST/SchemaCache.hs

Line 167 in 97cd559

, dbRelationships = getOverrideRelationshipsMap rels cRels

By not computing the relationships and just replacing it with a mempty the times gets reduced to:

28/Feb/2024:13:37:07 -0500: Starting PostgREST 12.1 (160caaf)...

28/Feb/2024:13:37:07 -0500: Schema cache loaded 326 Relations, 0 Relationships, 6 Functions, 0 Domain Representations, 4 Media Type Handlers
28/Feb/2024:13:37:07 -0500: Schema cache loaded in 1.4 milliseconds

Some profiling also indicates that it's not the query but the different tranformations we do on Haskell code that take a lot of time and CPU:

postgrest/src/PostgREST/SchemaCache.hs

Line 163 in 97cd559

    
           rels          = addInverseRels $ addM2MRels tabsWViewsPks $ addViewM2OAndO2ORels keyDeps m2oRels

It should be possible to do that logic on the db instead but it will take a fair amount of work.

Edit: addM2MRels and addViewM2OAndO2ORels take most of the time.

wolfgangwalther · 2024-02-28T18:42:09Z

The problem is not related to the tablesSqlQuery, which is fast. It's only isolated to the relationships

I expected the query itself to be fast. But if we fetch fewer tables, the relationship calculation in haskell should be faster, right?

wolfgangwalther added the perf label Nov 10, 2023

steve-chavez mentioned this issue Feb 21, 2024

feat: log schema cache load time #3253

Merged

steve-chavez mentioned this issue Mar 14, 2024

Responses are slow when the schema cache reloads #3327

Closed

This was referenced Jan 21, 2025

fix: handle queries on non-existing table gracefully #3869

Draft

Schema cache speed went down after refactor #3882

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large schema cache reloads cause 100% CPU spike when schema cache metadata is accessed. #3046

Large schema cache reloads cause 100% CPU spike when schema cache metadata is accessed. #3046

colophonemes commented Nov 10, 2023 •

edited

Loading

wolfgangwalther commented Nov 10, 2023

colophonemes commented Nov 10, 2023 •

edited

Loading

wolfgangwalther commented Nov 10, 2023

colophonemes commented Feb 20, 2024

wolfgangwalther commented Feb 20, 2024

steve-chavez commented Feb 20, 2024

colophonemes commented Feb 21, 2024

steve-chavez commented Feb 21, 2024

wolfgangwalther commented Feb 21, 2024

colophonemes commented Feb 21, 2024

steve-chavez commented Feb 21, 2024

steve-chavez commented Feb 21, 2024

colophonemes commented Feb 22, 2024

steve-chavez commented Feb 28, 2024 •

edited

Loading

wolfgangwalther commented Feb 28, 2024

Large schema cache reloads cause 100% CPU spike when schema cache metadata is accessed. #3046

Large schema cache reloads cause 100% CPU spike when schema cache metadata is accessed. #3046

Comments

colophonemes commented Nov 10, 2023 • edited Loading

Environment

Description of issue

Possibly an interaction with TimescaleDB?

Possible mitigations

wolfgangwalther commented Nov 10, 2023

colophonemes commented Nov 10, 2023 • edited Loading

wolfgangwalther commented Nov 10, 2023

colophonemes commented Feb 20, 2024

wolfgangwalther commented Feb 20, 2024

steve-chavez commented Feb 20, 2024

colophonemes commented Feb 21, 2024

steve-chavez commented Feb 21, 2024

wolfgangwalther commented Feb 21, 2024

colophonemes commented Feb 21, 2024

steve-chavez commented Feb 21, 2024

steve-chavez commented Feb 21, 2024

colophonemes commented Feb 22, 2024

steve-chavez commented Feb 28, 2024 • edited Loading

wolfgangwalther commented Feb 28, 2024

colophonemes commented Nov 10, 2023 •

edited

Loading

colophonemes commented Nov 10, 2023 •

edited

Loading

steve-chavez commented Feb 28, 2024 •

edited

Loading