Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encountered a subscription error: Error applying subscription data: FOREIGN KEY constraint failed #609

Closed
gorbak25 opened this issue Nov 1, 2023 · 17 comments

Comments

@gorbak25
Copy link

gorbak25 commented Nov 1, 2023

Initial clean sync got broken in electric-sql at one point. It happened when the state of 1 client accumulated a few MB's of data, previously it worked fine.
image
I've verified all foreign key constrains in the Postgres DB and the data model is consistent on the server side.

For now as a workaround we decided to disable foreign key constrains during the initial sync:

    const syncItems = async () => {
      try {
        await db.raw({
          sql: dedent`PRAGMA foreign_keys = 0`,
        });
        // Resolves when the shape subscription has been established.
        const shape = await db.DataTable.sync({
          include: {
            Column: {
              include: {
                CellValue: true,
                IntegrationInstance: { include: { ColumnReference: true } },
              },
            },
            Row: {
              include: {
                CellValue: true,
              },
            },
          },
        });
        await shape.synced;
      } finally {
        await db.raw({
          sql: dedent`PRAGMA foreign_keys = 1`,
        });
      }
    };

I've tried to isolate the bug in a separate repo but i'm unable to reproduce it besides it happening in our production app.
https://github.com/hocus-dev/electric-relation-order-bug

I think what is happening is that when the data get's big enough(https://github.com/hocus-dev/electric-relation-order-bug/blob/e8b09338c2517277b90afacc0c82ea584a39758f/prisma/schema.prisma#L35) the data gets inserted into the DB in the wrong order.

Can you help point me in the right direction with finding the root cause behind the failure?

@thruflo
Copy link
Contributor

thruflo commented Nov 1, 2023

Thanks for reporting and for the repo to reproduce. We'll take a look.

@gorbak25
Copy link
Author

gorbak25 commented Nov 1, 2023

Thanks for reporting and for the repo to reproduce. We'll take a look.

To be clear. I'm still unable to reproduce the issue outside of production but the repro repo contains part of the affected DB schema most likely to cause the error.

@thruflo
Copy link
Contributor

thruflo commented Nov 1, 2023

Right, yes, got it, thanks for the clarification.

@kevin-dp
Copy link
Contributor

kevin-dp commented Nov 2, 2023

Our TypeScript client enables foreign keys on SQLite. By default, foreign keys in SQLite are checked immediately so the order in which rows are inserted during a transaction does matter. From the docs:

Each foreign key constraint in SQLite is classified as either immediate or deferred. Foreign key constraints are immediate by default. If a statement modifies the contents of the database so that an immediate foreign key constraint is in violation at the conclusion the statement, an exception is thrown and the effects of the statement are reverted. By contrast, if a statement modifies the contents of the database such that a deferred foreign key constraint is violated, the violation is not reported immediately. Deferred foreign key constraints are not checked until the transaction tries to COMMIT. For as long as the user has an open transaction, the database is allowed to exist in a state that violates any number of deferred foreign key constraints. However, COMMIT will fail as long as foreign key constraints remain in violation.

Perhaps two things to figure out if this is the issue:

  1. Could you try to load your data in plain SQLite to see if the same problem occurs. Don't forget to enable foreign keys because they are disabled by default!
  2. If the problem also occurs in plain SQLite, try to use the defer_foreign_keys pragma before your transaction to see if that solves the problem. If it does, we found the culprit.

The defer_foreign_keys pragma will defer all foreign key constraints for the duration of the transaction. However, you don't have control over the transactions in your Electric app. Perhaps, our TypeScript client should do that before every transaction, or perhaps more efficient, we could define all SQLite foreign keys to be deferred when creating the SQLite schemas.

Since you don't have this problem on Postgres could it be that you configured Postgres to defer constraint checks?

@thruflo
Copy link
Contributor

thruflo commented Nov 2, 2023

Do you think our initial sync mechanism could somehow be sending rows in the wrong order? It may be hard for @gorbak25 to replicate the order of the rows if not using our sync mechanism.

@gorbak25
Copy link
Author

gorbak25 commented Nov 2, 2023

@thruflo in the repo I've sent the operations arrive from the electric server in the wrong order:

[proto] recv: #SatOpLog{ops: [#Insert{for: 16653, tags: [postgres_1@1698916164718], new: ["6a074995-d312-4b64-a836-ff268838e094", "ColumnReference", "d3e7369d-3789-4264-8b27-17a52ce4866f"]}, #Insert{for: 16653, tags: [postgres_1@1698916164718], new: ["51003610-8076-4059-a7ad-fb97e107ea47", "ColumnReference", "d67bd5d8-ea2b-4c4c-9761-e22306978b5b"]}, #Insert{for: 16653, tags: [postgres_1@1698916164718], new: ["45953b64-2b33-4de7-9871-5af09c49ba14", "ColumnReference", "55d1809f-8eaf-46f7-9baa-586a2467cd87"]}, #Insert{for: 16653, tags: [postgres_1@1698916164718], new: ["2c1bc411-478c-42b3-ad93-4479d134a88b", "ColumnReference", "d3e7369d-3789-4264-8b27-17a52ce4866f"]}, #Insert{for: 16653, tags: [postgres_1@1698916164718], new: ["f9cea58b-8b11-4b4f-8eef-a45624b06869", "ColumnReference", "d67bd5d8-ea2b-4c4c-9761-e22306978b5b"]}, #Insert{for: 16653, tags: [postgres_1@1698916164718], new: ["8385ed8a-3438-42a6-bb05-9a350d80ee4b", "ColumnReference", "55d1809f-8eaf-46f7-9baa-586a2467cd87"]}, #Insert{for: 16653, tags: [postgres_1@1698916164718], new: ["059f2dc1-1b19-4c9e-9c12-1bf54ff016b8", "ColumnReference", "d3e7369d-3789-4264-8b27-17a52ce4866f"]}, #Insert{for: 16653, tags: [postgres_1@1698916164718], new: ["73673e56-f02d-4762-86ae-781d08f8c73d", "ColumnReference", "d67bd5d8-ea2b-4c4c-9761-e22306978b5b"]}, #Insert{for: 16653, tags: [postgres_1@1698916164718], new: ["81067e61-4240-4d4c-9fa0-1c2ae560cb28", "ColumnReference", "55d1809f-8eaf-46f7-9baa-586a2467cd87"]}, #Insert{for: 16646, tags: [postgres_1@1698916164718], new: ["d3e7369d-3789-4264-8b27-17a52ce4866f", "ColumnReferenceSet"]}, #Insert{for: 16646, tags: [postgres_1@1698916164718], new: ["d67bd5d8-ea2b-4c4c-9761-e22306978b5b", "ColumnReferenceSet"]}, #Insert{for: 16646, tags: [postgres_1@1698916164718], new: ["55d1809f-8eaf-46f7-9baa-586a2467cd87", "ColumnReferenceSet"]}, #Insert{for: 16639, tags: [postgres_1@1698916164718], new: ["1902fdd6-9937-470e-ba21-42777e689a97", "Column", "d3e7369d-3789-4264-8b27-17a52ce4866f", "8736175c-183f-4def-a4e8-3d29c3d1a2d0"]}, #Insert{for: 16639, tags: [postgres_1@1698916164718], new: ["a91e773f-0bf7-4947-9748-fece158d34de", "Column", "d67bd5d8-ea2b-4c4c-9761-e22306978b5b", "8736175c-183f-4def-a4e8-3d29c3d1a2d0"]}, #Insert{for: 16639, tags: [postgres_1@1698916164718], new: ["97ef66b7-5bab-43c2-aa5e-383a26f1865d", "Column", "55d1809f-8eaf-46f7-9baa-586a2467cd87", "8736175c-183f-4def-a4e8-3d29c3d1a2d0"]}, #Insert{for: 16632, tags: [postgres_1@1698916164718], new: ["8736175c-183f-4def-a4e8-3d29c3d1a2d0", "DataTable"]}]}

Here ColumnReference is sent first instead of ColumnReferenceSet, this is also what I've observed on production. Does the electric client reoder those operations so foreign keys are respected?(insert in toposort order). Is this driver specific? It was working for a long time but it suddenly stopped working on production when there's too much data.

@kevin-dp We don't use deferred foreign keys in postgres. We always insert entities in an order compatible with the constrains (ColumnReferenceSet first, then ColumnReference, lastly we update Column to point to ColumnReferenceSet).

@thruflo
Copy link
Contributor

thruflo commented Nov 2, 2023

cc @icehaunter re initial sync ordering.

@davidmartos96
Copy link
Contributor

@kevin-dp Is the order supposed to matter when doing the initial sync Postgres -> Sqlite?
The client is already using PRAGMA defer_foreign_keys = ON when loading the remote entries into the local database, so until the end of the transaction, data could be in any order. Or not?

stmts.push({ sql: 'PRAGMA defer_foreign_keys = ON' })

@gorbak25
Copy link
Author

gorbak25 commented Nov 2, 2023

One important detail is that the data arrives after the timeout #605 🤔 Does the same code path trigger in that case?

@kevin-dp
Copy link
Contributor

kevin-dp commented Nov 2, 2023

Here ColumnReference is sent first instead of ColumnReferenceSet, this is also what I've observed on production.

Ok tis is very useful information. @balegas was expecting this to be the cause. So because the transaction arrive in a different order it is essentially inserting a row with a foreign key to a record that does not yet exist... We should avoid this.

Does the electric client reoder those operations so foreign keys are respected?(insert in toposort order). Is this driver specific? It was working for a long time but it suddenly stopped working on production when there's too much data.

EDIT: The Electric client does not insert rows in topological order. As pointed out by @davidmartos96 this is not needed because the client enables the defer_foreign_keys pragma. But @balegas told me that Electric currently may not respect transaction ordering when sending it to the client so that can lead to issues with FKs.

@kevin-dp
Copy link
Contributor

kevin-dp commented Nov 2, 2023

@kevin-dp Is the order supposed to matter when doing the initial sync Postgres -> Sqlite? The client is already using PRAGMA defer_foreign_keys = ON when loading the remote entries into the local database, so until the end of the transaction, data could be in any order. Or not?

stmts.push({ sql: 'PRAGMA defer_foreign_keys = ON' })

You're right, order within a transaction doesn't matter. I had missed this defer_foreign_keys call in the client when looking at it this morning. So that could not lead to issues at all.

@balegas
Copy link
Contributor

balegas commented Nov 2, 2023

One important detail is that the data arrives after the timeout #605 🤔 Does the same code path trigger in that case?

I wonder if there is some unspecified behaviour after the timeout. It doesn't look relevant for the problem at hand, except that we have very destructive conservative GC when errors occur. - @gorbak25, would you be able to test with a modified client that has a larger timeout to see if the problem goes away?

  • @kevin-dp the configurable timeout is another quick patch that we should tackle.

But @balegas told me that Electric currently may not respect transaction ordering when sending it to the client so that can lead to issues with FKs.

@icehaunter made the point that on the server, by the virtue of (right now) shapes requiring that all parent, child (and grandchildren) having to be included in a shape definition, all rows and their FKs have to be delivered in same Txn, therefore the order would be irrelevant because of deferred fk checks.

  • @gorbak25 please confirm our assumptions that you're syncing the data using a single shape and that the FK constraint failure occurs while applying the initial sync data for that shape

What we know is brittle is that on the client we may handle incoming transactions out of order, but this would require multiple transactions and not a single transaction (which would not be possible during the initial sync )

@gorbak25
Copy link
Author

gorbak25 commented Nov 10, 2023

@balegas Yes there is only 1 shape subscription. I'm sometimes still hitting this FK error when syncing a lot of small updates from the backend(and there is no timeout). I'm currently migrating from 0.6.3 to 0.7.1

@balegas
Copy link
Contributor

balegas commented Nov 15, 2023

It's different if the issue happens during the initial sync of the shape or during streaming phase. Could you confirm that you no longer see the issue in the finial sync phase, but only during streaming of 'small updates'

@gorbak25
Copy link
Author

@balegas Yesterday @hugodutka encountered this again when syncing small updates. This happened on 7.1

@balegas
Copy link
Contributor

balegas commented Nov 16, 2023

ok, that's slightly better, because during streaming phase we know where we can improve. We have an issue already for addressing that.

@kevin-dp
Copy link
Contributor

kevin-dp commented Feb 6, 2024

Closing this as addressed by #696.

@kevin-dp kevin-dp closed this as completed Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants