Patch Application #686
Replies: 5 comments 1 reply
-
Just published today: https://bartoszsypytkowski.com/yrs-architecture/ |
Beta Was this translation helpful? Give feedback.
-
Very interesting reading @bradjonesca |
Beta Was this translation helpful? Give feedback.
-
This looks super promising! I'm wondering about cases where I don't (or don't want to) know the current state of the data though. I do realise that this goes against some of the points you make above. I still think that there are use cases and that if this were to be supported, it would make sense to use the same interface for that. (But feel free to disregard this comment if you think I'm completely missing the point) 🙃 Let's say I've some sensor data where I'm only interested in storing the latest available data. This could be achieved simply by omitting {
"@id" : "TemperatureSensor/Room111"
"measurement" : { "@insert" : 19.4 }
} In the case of sets omitting {
"@id" : "TemperatureSensor/Room111"
"measurements" : [{ "@insert" : 19.4 }, { "@insert" : 21.7 }]
} For lists I'm imagining something where I want to store my 10 latest measurements. For that one would probably need to be able to delete at a specific index: {
"@id" : "TemperatureSensor/Room111",
"measurements" : [{
"@insert" : {
"@value" : 19.4,
"@after": 0
},
"@delete" : {
"@at": 10
}
}]
} |
Beta Was this translation helpful? Give feedback.
-
Indeed the "don't care" update was what I was thinking of in the final paragraph of the above, and you've even given the same syntax that we've been discussing informally! The list one is very interesting and will require some playing around with, but definitely being able to remove an index seems like a useful operation. |
Beta Was this translation helpful? Give feedback.
-
In order to make this practicable with the current code base we need to be able to synthesis the IDs of all objects modified by a given delta. [ ] At first, we will merely create a predicate that retrieves the parent ID of the enclosing document of any triple. Later we may record this information directly. [ ] After obtaining all ids changed in a given delta stack, then we can do document retrieval and use diff to synthesise the patch. For all changed objects. [ ] At this point, application will be this patch, plus a resource, so that will require a new endpoint (or an extended version of the same) that takes a resource as an argument. |
Beta Was this translation helpful? Give feedback.
-
For us to use complex documents in a collaborative way effectively, we need to have a system which is able to do patch synthesis, diff, and patch application. These features will move us from being just a version database to being a collaborative document graph, which can be used on a wide variety of structured data.
Having an effective system for patch/diff will allow us to have users perform operations collaboratively in an offline or semi-offline fashion and gain knowledge about what causes merge failure when they occur, allowing us to build interfaces around conflicts.
Patch
The simplest patch structure suits us best - one which is structured based on inserts and deletes at a document. This allows us to look only at patches to identify if operations commute.
An example of how this might be structured is as follows:
Tree structured diffs which support an insertion / deletion syntax will allow us to perform updates which require the data to be as described in the patch.
This approach facilitates updates which keep track of the expected state of the original object, and can allow computations to take place on patches (to check for commutativity for instance). In addition selecting everything which must be present for read-model coherence could extend this - we simply describe parts of the tree as we expect to see them, with neither a delete, nor an insert.
For instance:
Patch application is then a reversible operation - we can construct what would occur from a list of patches to objects when applied to a commit, to yield another commit, or conversely what would be unapplied to recover the original.
We can also use these properties to "stack" patches and see whether they commute. Two patches commute if their application would be identical in either order. An example of commuting patches:
Here, regardless of the shape of the original document, the final document will be the same regardless of the order of patch application.
Type Families
The type families present us with the most difficult aspects of patch application.
Optional where the original value does not exist, or the final value is deleted can be specified using an explicit
null
field. Here is an example in which we merely add the value "georgi" to "name" but expect "name" to be empty when the patch is applied.Since sets are collections without order, I would propose that the vanilla diff for sets should be composed as the elements which are inserted as a
null
operation, those that are updated as a change operation between the two, and those that are not altered as implicit (and not explicitly specified in the operation).We will imagine that we have a task set, of subdocument objects.
Lists have order and as such we probably want the vanilla list to be able to insert a list from an index, or delete a list from an index.
Supposing our task list is of type family List.
Notice the patch has a list for the "tasks" property. This is because the patches may need to be inserted at multiple points. This also allows a "read" state to be added by explicit mention in the appropriate positions in the list.
This is flexible - and can be made explicitly "tight" or explicitly "loose" which I think enables many different patches to be produced which apply identically when successful, but which have more or less restrictive domains.
... no idea ...
Patch Synthesis
Because of past choices, we do not store information such that patches are readily apparent to us. Instead we have a flat plane of added triples and a flat plane of removed triples. However, we would really like to be able to take these individual commit planes and synthesise patches from them.
We will need this when we have a commit which has already been applied, and we want to see whether we can commute our patch for an object with this plane.
The above patch would look like:
Of course this version is relatively simple, but with sub-objects and the families: set / list / array, patch synthesis will be more complex.
Diff
The heart of change is diff. If you want to know what is different between now and some other time, diff can tell you. But the patches that you construct from a diff are not entirely straight-forward. Basically you want to find a minimal set of changes to transform one object into another and what is minimal requires some sort of objective function. This involves semantics.
Ideally we would provide a general and useful diff, which could be customised by specifying semantic flags at the types. Type directed diff which could value different kinds of changes differently would make for a very flexible approach which can be extended as we find different approaches necessary.
To give a brief example, the idea of a table diff is much different than a simple list diff. A table diff tries to see if shifting of rows or columns can be achieved to minimise the cost function, only after which we might look for individual cell level changes.
The first can be transformed into the second by deletion of 3@2, and insertion of a 2@2 and 3@3, as well as a deletion of a 6@2 and insertion of a 5@2 and 6@3. However what we actually want here is the insertion of a new column - so we should shift everything to the right, rather than mention the final column explicitly at all in the change. The later is a table diff, the former is a cell diff. It is impossible to know in general what you actually want here without more information.
Transaction Semantics
As mentioned above, patches can also help us to perform safe document transactions. If we know what state we expect to have when we perform an operation, we can know whether it is safe to perform it - we can also know if it is safe to reperform it in the case of another transaction winning the race to commit.
Right now we do not have a good snapshot isolation approach to documents because we do not return the commit id in the headers, and we can not submit the commit id at update time - but this is a fairly straightforward change to the interface which we can (and should) make soon.
But even without this, a more flexible interface can be exposed by simply submitting the patch that you want to apply.
Further, these patches could be quite convenient in that you do not have to express every aspect of the document which you want to update - only what you think is there for a given portion of the document when you perform the update.
We could even have a wild-card type patch which allowed you to submit an "I don't care" type patch for what exists presently and what you want to see in the final state. This could be very convenient for trying to force the state of an object which you are unconcerned will be clobbered (perhaps because the state only monotonically approaches some value - like a finalized state which will not be unfinalized by any system operation).
The Development Plan
Beta Was this translation helpful? Give feedback.
All reactions