-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: add support for reference deletion and enclose mutations in transactions #63
Conversation
7f8b0fb
to
151a883
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious as to what kind of standards we'd want to have when we make our DB tables in www going forward since we are supporting the CASCADE
and SET NULL
behavior at the Entity level now. Should we always be preferring cascades/set nulls be done at the Entity level and migrate our existing tables to conform to this? Or are there cases where it makes sense to have these done at the DB level?
Reading the code, There seems to be a potential bug with Edit: I realized that we need to run I agree with the sentiment in Quin's comment about developing some guidance around which policy to use, especially when designing a schema from scratch. I see two dimensions here: should the deletion constraint triggers be the responsibility of the entities or the database? and should a deletion trigger's behavior be to delete the child node (CASCADE DELETE), to null out the reference from the child to parent (SET NULL), or to throw an error because the scenario requires extra application code to properly delete the child node before the parent? (An example of the latter case: we store the content of some Snacks in Google Cloud Storage. When deleting a Snack, we'd want to run some code to delete the GCS file too, or at least schedule a task to clean up GCS. Postgres can't do this automatically and it's not clear Entities should take care of this either. It's behavior we want in application code. In this case we want the default behavior of ON DELETE RESTRICT to cause the deletion of a parent node to fail if the Snack nodes haven't been deleted.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes for some small comments and most notably, the order of operations between clearing the database and clearing the cache for the INVALIDATE_CACHE
policy.
My sense is that when people create new entity relationships (either by adding a new entity type or adding a new edge), they start by modifying the Postgres schema. It's at this time they're thinking most about how deletion triggers should work. So, I think it's more natural to express However, I see two flaws with this approach in contrast with having entities perform the deletion triggers: (1) The deletion trigger policy for Postgres is dislocated from the deletion trigger policy for the cache. It is easy to declare (2) Generally speaking, bypassing the entities library to do a mutation is a source of problems because the entities library needs to know about all mutations for cache consistency. Postgres is bypassing the entities library when it processes |
Yeah this is something I debated quite a bit while writing it. I had two choices for
I'll try to change it and see if I can do it cleanly.
I'm pretty unsure on what guidance to provide as well. I think it depends on the type of database being used. For a nosql database (which theoretically entity could support) it would make more sense to do it in the entity layer. For relational DBs I really like specifying foreign keys but I know that at certain scale foreign keys become a huge pain point. The rails convention is for relational DBs to express relationships only in application code (only recently did they start to support real foreign keys). My gut says that people will have their opinions, and the framework should just support all of them equally. Concretely for us in
Non-relational triggers are something that will be needed pretty soon for audit logging. This almost makes me think that we may need to rethink ordering of operations during delete and whether we need to topologically sort trigger, db deletion, and cache execution for all dependent entities. I'll spend some time thinking about ordering and triggers. |
One interesting thing about rails: both the save and destroy methods are automatically wrapped in a transaction if one isn't provided. I'm starting to think we should do the same here so that we can do better ordering. |
What seems trickiest about this is that the third step is to invalidate the cache for entities that no longer exist in the database. They are kind of like phantom entities. If we made it easy to work with and invalidate these phantom entities, it might be easy to implement this three-phase deletion.
This sounds like a good idea. |
Codecov Report
@@ Coverage Diff @@
## master #63 +/- ##
==========================================
+ Coverage 94.07% 94.42% +0.34%
==========================================
Files 58 58
Lines 1384 1452 +68
Branches 147 159 +12
==========================================
+ Hits 1302 1371 +69
+ Misses 81 79 -2
- Partials 1 2 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
After trying this, I'm even more convinced that keeping the code simpler in exchange for potentially erroneous cache invalidations is the right tradeoff. Worst case it would invalidate the cache of an entity who's deletion fails and the next time it is loaded it will be loaded from the database. No inconsistencies would arise.
Updated the PR to add this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On transactions:
- Agree this is a really good idea to require edge deletions be done in a transaction. At the Postgres layer, the default behaviour is
RESTRICT
, so it'd enforce fkey integrity by erroring out if any of the edge deletions failed and we tried to delete the main entity. But in the case where its a No-SQL DB, there's nothing at the DB layer to enforce these constraints so a transaction would protect against this.
On www
schema guidance:
- Thinking about this some more, it makes sense to make the policy at the DB level stricter than what is enforced at the Entity level. In the Postgres case, it would be going with the default
RESTRICT
policy. - In the case where there is a bug in the Entity framework:
- If the DB policy is always stricter than what is required by the Entity framework (RESTRICT
), then a bad Entity deletion attempt will just result in an error at the DB level. The worst case here is that the Entities are deleted from cache but all the data is still consistent. The DB will act as a failsafe to enforce fkey integrity.
- If the DB policy is equivalent or looser than what is required by the Entity framework (ON CASCADE
or a noSQL db that doesnt enforce fkey integrity), then you start getting cases of inconsistent data. FKEY integrity would get compromised in the noSQL case, and in theON CASCADE
case, the cache could be inconsistent with the DB because the DB could have done theON CASCADE
work and the Entities framework would not have known to invalidate the cache
This sounds like good guidance. We can always change RESTRICT to CASCADE if the entities properly declare the association between two entity types. @wschurman one thing that came to mind that might be good to test is transitively cascading deletions. Say entity A references B which references C. When C is deleted, we delete B, which should delete A. I think the current code neatly takes care of this recursively but it's something to think about. (Also, do we need some kind of cycle detection in the assocs?) |
Agreed. I think eventually the framework will reach a point when it's essentially guaranteed to be have full coverage and stability for referential integrity and we'll be able to recommend either (like rails/activerecord does), but until then I agree that this sounds like good advice.
The current code definitely handles this but I agree that we should have a test for it. As for cycle detection, that's definitely something that needs a test. I can think of two types of cycles:
I'll add some tests for these cases and a simple transitive one as well. |
…-referential foreign key tests
Updated the PR to support cyclical references, and test in database adapter and unit tests. Note that keeping the recursive base case (check to ensure the entity hasn't already been processed before) required slight changes to existing behavior where a database adapter would throw if it didn't delete anything, but thinking about that behavior more I think it's fine and probably correct since in a distributed system a single ID could be deleted by multiple processes at the same time. An alternative here would be to additionally check that the entity hasn't been processed before in |
Why
#19
This PR adds the ability to specify relationships between entities for use during deletion. Three modes are supported:
INVALIDATE_CACHE
: Invalidates the cache for all entities that reference the entity being deleted. This is most useful when the database itself expresses foreign keys and cascading deletes or set nulls and the entity framework just needs to be kept consistent with the state of the database.CASCADE_DELETE
: Deletes the entities that reference the entity being deleted. This is very similar to DBMSON DELETE CASCADE
but is done in the Entity framework instead of at the underlying level, and should not be used in combination with database-schema-expressed foreign keys and deletion behavior. This has the nice effect of keeping the cache consistent as well of course.SET_NULL
: Sets fields referencing the entity being deleted to null in any entities that reference the entity being deleted.Next up: See if it is possible to combine this and association loader in a type safe way. I'm doubting it'll be possible to infer types of edges based on this new schema association specification, but we may at least be able to share the edge definitions for chain loading.
Edit: this PR also now creates a transaction if not already in one for all creates, updates, and deletes. This will become very useful for triggers, but is also useful for dependent deletions added in this PR.
How
The main piece of this PR is the
processEntityDeletionForInboundEdgesAsync
. See docblock for description. Everything else is tests.Test Plan
Run all new tests.