feat!: add support for reference deletion and enclose mutations in transactions #63

wschurman · 2020-07-11T16:56:03Z

Why

This PR adds the ability to specify relationships between entities for use during deletion. Three modes are supported:

INVALIDATE_CACHE: Invalidates the cache for all entities that reference the entity being deleted. This is most useful when the database itself expresses foreign keys and cascading deletes or set nulls and the entity framework just needs to be kept consistent with the state of the database.
CASCADE_DELETE: Deletes the entities that reference the entity being deleted. This is very similar to DBMS ON DELETE CASCADE but is done in the Entity framework instead of at the underlying level, and should not be used in combination with database-schema-expressed foreign keys and deletion behavior. This has the nice effect of keeping the cache consistent as well of course.
SET_NULL: Sets fields referencing the entity being deleted to null in any entities that reference the entity being deleted.

Next up: See if it is possible to combine this and association loader in a type safe way. I'm doubting it'll be possible to infer types of edges based on this new schema association specification, but we may at least be able to share the edge definitions for chain loading.

Edit: this PR also now creates a transaction if not already in one for all creates, updates, and deletes. This will become very useful for triggers, but is also useful for dependent deletions added in this PR.

How

The main piece of this PR is the processEntityDeletionForInboundEdgesAsync. See docblock for description. Everything else is tests.

Test Plan

Run all new tests.

quinlanj

I'm curious as to what kind of standards we'd want to have when we make our DB tables in www going forward since we are supporting the CASCADE and SET NULL behavior at the Entity level now. Should we always be preferring cascades/set nulls be done at the Entity level and migrate our existing tables to conform to this? Or are there cases where it makes sense to have these done at the DB level?

packages/entity/src/EntityMutator.ts

packages/entity/src/EntityFields.ts

ide · 2020-07-13T21:40:50Z

CASCADE_DELETE... should not be used in combination with database-schema-expressed foreign keys and deletion behavior.

Reading the code, processEntityDeletionForInboundEdgesAsync runs before the target entity is deleted. So even if Postgres declares ON DELETE CASCADE, we'll delete the child nodes before the parent and simply never make use of Postgres's trigger.

There seems to be a potential bug with INVALIDATE_CACHE: throughout the entities library, we mutate the database first (for updates and deletes) and clear the cache second. But here, INVALIDATE_CACHE causes processEntityDeletionForInboundEdgesAsync to clear the cache (for child entities) first and then mutate the database (for the parent entity) second. It seems to me the more correct behavior would be to perform the DB mutations first and then clear the cache entries afterward.

Edit: I realized that we need to run processEntityDeletionForInboundEdgesAsync before deleting the parent node, because if Postgres automatically deletes the child nodes (or nulls out their FK fields), we won't be able to query Postgres for all of the nodes that reference the parent node...

I agree with the sentiment in Quin's comment about developing some guidance around which policy to use, especially when designing a schema from scratch. I see two dimensions here: should the deletion constraint triggers be the responsibility of the entities or the database? and should a deletion trigger's behavior be to delete the child node (CASCADE DELETE), to null out the reference from the child to parent (SET NULL), or to throw an error because the scenario requires extra application code to properly delete the child node before the parent?

(An example of the latter case: we store the content of some Snacks in Google Cloud Storage. When deleting a Snack, we'd want to run some code to delete the GCS file too, or at least schedule a task to clean up GCS. Postgres can't do this automatically and it's not clear Entities should take care of this either. It's behavior we want in application code. In this case we want the default behavior of ON DELETE RESTRICT to cause the deletion of a parent node to fail if the Snack nodes haven't been deleted.)

ide

Requesting changes for some small comments and most notably, the order of operations between clearing the database and clearing the cache for the INVALIDATE_CACHE policy.

packages/entity/src/EntityMutator.ts

packages/entity/src/EntityFields.ts

packages/entity/src/EntityMutator.ts

ide · 2020-07-13T21:52:25Z

My sense is that when people create new entity relationships (either by adding a new entity type or adding a new edge), they start by modifying the Postgres schema. It's at this time they're thinking most about how deletion triggers should work. So, I think it's more natural to express ON DELETE CASCADE/SET NULL/RESTRICT (the default) in the Postgres schema and let entities take care of clearing the cache.

However, I see two flaws with this approach in contrast with having entities perform the deletion triggers:

(1) The deletion trigger policy for Postgres is dislocated from the deletion trigger policy for the cache. It is easy to declare ON DELETE CASCADE in Postgres and then forget to define an association between the entities in a follow-up PR.

(2) Generally speaking, bypassing the entities library to do a mutation is a source of problems because the entities library needs to know about all mutations for cache consistency. Postgres is bypassing the entities library when it processes ON DELETE CASCADE/SET NULL. (In this specific scenario it can be safe since the entities library knows that Postgres is going to perform those mutations...)

wschurman · 2020-07-14T18:39:58Z

There seems to be a potential bug with INVALIDATE_CACHE: throughout the entities library, we mutate the database first (for updates and deletes) and clear the cache second. But here, INVALIDATE_CACHE causes processEntityDeletionForInboundEdgesAsync to clear the cache (for child entities) first and then mutate the database (for the parent entity) second. It seems to me the more correct behavior would be to perform the DB mutations first and then clear the cache entries afterward.

Edit: I realized that we need to run processEntityDeletionForInboundEdgesAsync before deleting the parent node, because if Postgres automatically deletes the child nodes (or nulls out their FK fields), we won't be able to query Postgres for all of the nodes that reference the parent node...

Yeah this is something I debated quite a bit while writing it. I had two choices for INVALIDATE_CACHE:

Do the deletion in three steps: gather all the entities to be invalidated in one step, run the delete of the parent entity (thus deleting the underlying dependent data from the database), invalidate all the gathered entities.
Keep the code consistent between all deletion behavior modes (keep it all in the same switch). The only downside to this is that we could unnecessarily clear the cache if the deletion fails for some reason, but it won't cause any inconsistencies.
It's really hard to know what the best tradeoff is: keeping the code simple-ish or keeping invalidations consistent.

I'll try to change it and see if I can do it cleanly.

I agree with the sentiment in Quin's comment about developing some guidance around which policy to use, especially when designing a schema from scratch. I see two dimensions here: should the deletion constraint triggers be the responsibility of the entities or the database? and should a deletion trigger's behavior be to delete the child node (CASCADE DELETE), to null out the reference from the child to parent (SET NULL), or to throw an error because the scenario requires extra application code to properly delete the child node before the parent?

I'm pretty unsure on what guidance to provide as well. I think it depends on the type of database being used. For a nosql database (which theoretically entity could support) it would make more sense to do it in the entity layer. For relational DBs I really like specifying foreign keys but I know that at certain scale foreign keys become a huge pain point. The rails convention is for relational DBs to express relationships only in application code (only recently did they start to support real foreign keys).

My gut says that people will have their opinions, and the framework should just support all of them equally. Concretely for us in www my gut says keep using real foreign keys.

(An example of the latter case: we store the content of some Snacks in Google Cloud Storage. When deleting a Snack, we'd want to run some code to delete the GCS file too, or at least schedule a task to clean up GCS. Postgres can't do this automatically and it's not clear Entities should take care of this either. It's behavior we want in application code. In this case we want the default behavior of ON DELETE RESTRICT to cause the deletion of a parent node to fail if the Snack nodes haven't been deleted.)

Non-relational triggers are something that will be needed pretty soon for audit logging. This almost makes me think that we may need to rethink ordering of operations during delete and whether we need to topologically sort trigger, db deletion, and cache execution for all dependent entities.

I'll spend some time thinking about ordering and triggers.

wschurman · 2020-07-14T18:46:32Z

One interesting thing about rails: both the save and destroy methods are automatically wrapped in a transaction if one isn't provided. I'm starting to think we should do the same here so that we can do better ordering.

ide · 2020-07-15T01:59:05Z

Do the deletion in three steps: gather all the entities to be invalidated in one step, run the delete of the parent entity (thus deleting the underlying dependent data from the database), invalidate all the gathered entities.

What seems trickiest about this is that the third step is to invalidate the cache for entities that no longer exist in the database. They are kind of like phantom entities. If we made it easy to work with and invalidate these phantom entities, it might be easy to implement this three-phase deletion.

both the save and destroy methods are automatically wrapped in a transaction if one isn't provided. I'm starting to think we should do the same here so that we can do better ordering.

This sounds like a good idea.

codecov · 2020-07-15T04:17:21Z

Codecov Report

Merging #63 into master will increase coverage by 0.34%.
The diff coverage is 96.33%.

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
+ Coverage   94.07%   94.42%   +0.34%     
==========================================
  Files          58       58              
  Lines        1384     1452      +68     
  Branches      147      159      +12     
==========================================
+ Hits         1302     1371      +69     
+ Misses         81       79       -2     
- Partials        1        2       +1

Flag	Coverage Δ
#integration	`94.42% <96.33%> (+0.34%)`	⬆️
#unittest	`94.42% <96.33%> (+0.34%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
packages/entity/src/EntityCompanion.ts	`100.00% <ø> (ø)`
packages/entity/src/EntityDatabaseAdapter.ts	`100.00% <ø> (ø)`
packages/entity/src/internal/EntityDataManager.ts	`100.00% <ø> (ø)`
packages/entity/src/EntityQueryContext.ts	`94.44% <91.66%> (-5.56%)`	⬇️
packages/entity/src/EntityMutator.ts	`97.52% <94.91%> (-2.48%)`	⬇️
...ter-knex/src/PostgresEntityQueryContextProvider.ts	`100.00% <100.00%> (ø)`
...ample/src/adapters/InMemoryQueryContextProvider.ts	`100.00% <100.00%> (+33.33%)`	⬆️
packages/entity/src/Entity.ts	`100.00% <100.00%> (+12.00%)`	⬆️
packages/entity/src/EntityAssociationLoader.ts	`100.00% <100.00%> (ø)`
packages/entity/src/EntityConfiguration.ts	`100.00% <100.00%> (ø)`
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c92d10...f0a1219. Read the comment docs.

wschurman · 2020-07-15T04:23:09Z

Do the deletion in three steps: gather all the entities to be invalidated in one step, run the delete of the parent entity (thus deleting the underlying dependent data from the database), invalidate all the gathered entities.

What seems trickiest about this is that the third step is to invalidate the cache for entities that no longer exist in the database. They are kind of like phantom entities. If we made it easy to work with and invalidate these phantom entities, it might be easy to implement this three-phase deletion.

After trying this, I'm even more convinced that keeping the code simpler in exchange for potentially erroneous cache invalidations is the right tradeoff. Worst case it would invalidate the cache of an entity who's deletion fails and the next time it is loaded it will be loaded from the database. No inconsistencies would arise.

both the save and destroy methods are automatically wrapped in a transaction if one isn't provided. I'm starting to think we should do the same here so that we can do better ordering.

This sounds like a good idea.

Updated the PR to add this.

quinlanj

On transactions:

Agree this is a really good idea to require edge deletions be done in a transaction. At the Postgres layer, the default behaviour is RESTRICT, so it'd enforce fkey integrity by erroring out if any of the edge deletions failed and we tried to delete the main entity. But in the case where its a No-SQL DB, there's nothing at the DB layer to enforce these constraints so a transaction would protect against this.

On www schema guidance:

Thinking about this some more, it makes sense to make the policy at the DB level stricter than what is enforced at the Entity level. In the Postgres case, it would be going with the default RESTRICT policy.
In the case where there is a bug in the Entity framework:
- If the DB policy is always stricter than what is required by the Entity framework (RESTRICT), then a bad Entity deletion attempt will just result in an error at the DB level. The worst case here is that the Entities are deleted from cache but all the data is still consistent. The DB will act as a failsafe to enforce fkey integrity.
- If the DB policy is equivalent or looser than what is required by the Entity framework (ON CASCADE or a noSQL db that doesnt enforce fkey integrity), then you start getting cases of inconsistent data. FKEY integrity would get compromised in the noSQL case, and in the ON CASCADE case, the cache could be inconsistent with the DB because the DB could have done the ON CASCADE work and the Entities framework would not have known to invalidate the cache

ide · 2020-07-15T22:02:39Z

make the policy at the DB level stricter than what is enforced at the Entity level.

This sounds like good guidance. We can always change RESTRICT to CASCADE if the entities properly declare the association between two entity types.

@wschurman one thing that came to mind that might be good to test is transitively cascading deletions. Say entity A references B which references C. When C is deleted, we delete B, which should delete A. I think the current code neatly takes care of this recursively but it's something to think about. (Also, do we need some kind of cycle detection in the assocs?)

wschurman · 2020-07-15T23:32:52Z

On www schema guidance:

Thinking about this some more, it makes sense to make the policy at the DB level stricter than what is enforced at the Entity level. In the Postgres case, it would be going with the default RESTRICT policy.

In the case where there is a bug in the Entity framework:

If the DB policy is always stricter than what is required by the Entity framework (RESTRICT), then a bad Entity deletion attempt will just result in an error at the DB level. The worst case here is that the Entities are deleted from cache but all the data is still consistent. The DB will act as a failsafe to enforce fkey integrity.

If the DB policy is equivalent or looser than what is required by the Entity framework (ON CASCADE or a noSQL db that doesnt enforce fkey integrity), then you start getting cases of inconsistent data. FKEY integrity would get compromised in the noSQL case, and in the ON CASCADE case, the cache could be inconsistent with the DB because the DB could have done the ON CASCADE work and the Entities framework would not have known to invalidate the cache

Agreed. I think eventually the framework will reach a point when it's essentially guaranteed to be have full coverage and stability for referential integrity and we'll be able to recommend either (like rails/activerecord does), but until then I agree that this sounds like good advice.

one thing that came to mind that might be good to test is transitively cascading deletions. Say entity A references B which references C. When C is deleted, we delete B, which should delete A. I think the current code neatly takes care of this recursively but it's something to think about. (Also, do we need some kind of cycle detection in the assocs?)

The current code definitely handles this but I agree that we should have a test for it. As for cycle detection, that's definitely something that needs a test. I can think of two types of cycles:

A owns a B. A owns a C. B owns a C. A is deleted, and therefore B and C should be deleted.
A owns a B. B owns a different A via a different field of A. Or something like this.

I'll add some tests for these cases and a simple transitive one as well.

…-referential foreign key tests

packages/entity/src/EntityFields.ts

packages/entity/src/__tests__/EntitySelfReferentialEdges-test.ts

wschurman · 2020-07-22T03:48:01Z

Updated the PR to support cyclical references, and test in database adapter and unit tests.

Note that keeping the recursive base case (check to ensure the entity hasn't already been processed before) required slight changes to existing behavior where a database adapter would throw if it didn't delete anything, but thinking about that behavior more I think it's fine and probably correct since in a distributed system a single ID could be deleted by multiple processes at the same time. An alternative here would be to additionally check that the entity hasn't been processed before in deleteInternalAsync itself so that the delete call to the database adapter never occurs. Might be worth pairing on this briefly tomorrow to compare the two options and share ideas.

packages/entity/src/EntityMutator.ts

wschurman requested review from ide and quinlanj July 11, 2020 16:56

[www] Add support for reference deletion

151a883

wschurman force-pushed the @wschurman/references branch from 7f8b0fb to 151a883 Compare July 11, 2020 16:56

wschurman changed the title ~~[www] Add support for reference deletion~~ feat: add support for reference deletion Jul 13, 2020

Fix tsc

98b43f4

quinlanj reviewed Jul 13, 2020

View reviewed changes

packages/entity/src/EntityMutator.ts Outdated Show resolved Hide resolved

packages/entity/src/EntityFields.ts Outdated Show resolved Hide resolved

ide requested changes Jul 13, 2020

View reviewed changes

packages/entity/src/EntityMutator.ts Outdated Show resolved Hide resolved

packages/entity/src/EntityFields.ts Outdated Show resolved Hide resolved

packages/entity/src/EntityMutator.ts Outdated Show resolved Hide resolved

Attempt to improve transactions

a5a7386

wschurman requested review from quinlanj and ide July 15, 2020 04:23

wschurman changed the title ~~feat: add support for reference deletion~~ feat!: add support for reference deletion Jul 15, 2020

wschurman changed the title ~~feat!: add support for reference deletion~~ feat!: add support for reference deletion and enclose mutations in transactions Jul 15, 2020

quinlanj approved these changes Jul 15, 2020

View reviewed changes

wschurman added 2 commits July 16, 2020 09:53

Fix recursive call for invalidate cache and add longer chain and self…

155d8aa

…-referential foreign key tests

Fix tsc

78fad4b

ide requested changes Jul 18, 2020

View reviewed changes

Cycle handling

bb1c346

wschurman requested a review from ide July 22, 2020 03:48

ide approved these changes Jul 22, 2020

View reviewed changes

packages/entity/src/EntityMutator.ts Outdated Show resolved Hide resolved

Use set

f0a1219

wschurman merged commit 1224de7 into master Jul 22, 2020

wschurman deleted the @wschurman/references branch July 22, 2020 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: add support for reference deletion and enclose mutations in transactions #63

feat!: add support for reference deletion and enclose mutations in transactions #63

wschurman commented Jul 11, 2020 •

edited

Loading

quinlanj left a comment

ide commented Jul 13, 2020 •

edited

Loading

ide left a comment

ide commented Jul 13, 2020

wschurman commented Jul 14, 2020 •

edited

Loading

wschurman commented Jul 14, 2020

ide commented Jul 15, 2020

codecov bot commented Jul 15, 2020 •

edited

Loading

wschurman commented Jul 15, 2020

quinlanj left a comment

ide commented Jul 15, 2020 •

edited

Loading

wschurman commented Jul 15, 2020

wschurman commented Jul 22, 2020

feat!: add support for reference deletion and enclose mutations in transactions #63

feat!: add support for reference deletion and enclose mutations in transactions #63

Conversation

wschurman commented Jul 11, 2020 • edited Loading

Why

How

Test Plan

quinlanj left a comment

Choose a reason for hiding this comment

ide commented Jul 13, 2020 • edited Loading

ide left a comment

Choose a reason for hiding this comment

ide commented Jul 13, 2020

wschurman commented Jul 14, 2020 • edited Loading

wschurman commented Jul 14, 2020

ide commented Jul 15, 2020

codecov bot commented Jul 15, 2020 • edited Loading

Codecov Report

wschurman commented Jul 15, 2020

quinlanj left a comment

Choose a reason for hiding this comment

ide commented Jul 15, 2020 • edited Loading

wschurman commented Jul 15, 2020

wschurman commented Jul 22, 2020

wschurman commented Jul 11, 2020 •

edited

Loading

ide commented Jul 13, 2020 •

edited

Loading

wschurman commented Jul 14, 2020 •

edited

Loading

codecov bot commented Jul 15, 2020 •

edited

Loading

ide commented Jul 15, 2020 •

edited

Loading