Patch Application #686

GavinMendelGleason · 2021-10-14T10:38:13Z

GavinMendelGleason
Oct 14, 2021
Maintainer

For us to use complex documents in a collaborative way effectively, we need to have a system which is able to do patch synthesis, diff, and patch application. These features will move us from being just a version database to being a collaborative document graph, which can be used on a wide variety of structured data.

Having an effective system for patch/diff will allow us to have users perform operations collaboratively in an offline or semi-offline fashion and gain knowledge about what causes merge failure when they occur, allowing us to build interfaces around conflicts.

Patch

The simplest patch structure suits us best - one which is structured based on inserts and deletes at a document. This allows us to look only at patches to identify if operations commute.

An example of how this might be structured is as follows:

{ "@id" : "Object/my_obj"
  "name" : { "@delete" : "george", "@insert" : "georgi" },
  "status" : { "@delete" : "active", "@insert" : "inactive" }}

Tree structured diffs which support an insertion / deletion syntax will allow us to perform updates which require the data to be as described in the patch.

This approach facilitates updates which keep track of the expected state of the original object, and can allow computations to take place on patches (to check for commutativity for instance). In addition selecting everything which must be present for read-model coherence could extend this - we simply describe parts of the tree as we expect to see them, with neither a delete, nor an insert.

For instance:

 { "@id" : "Object/my_obj"
  "name" : { "@delete" : "george", "@insert" : "georgi" },
  "address" : "10 Abigail Lane",
  "status" : { "@delete" : "active", "@insert" : "inactive" }}

Patch application is then a reversible operation - we can construct what would occur from a list of patches to objects when applied to a commit, to yield another commit, or conversely what would be unapplied to recover the original.

We can also use these properties to "stack" patches and see whether they commute. Two patches commute if their application would be identical in either order. An example of commuting patches:

[ 
  { "@id" : "Object/my_obj"
    "name" : { "@delete" : "george", "@insert" : "georgi" }},
  { "@id" : "Object/my_obj"
    "status" : { "@delete" : "active", "@insert" : "inactive" }}
] == [ 
  { "@id" : "Object/my_obj"
    "status" : { "@delete" : "active", "@insert" : "inactive" }},
  { "@id" : "Object/my_obj"
    "name" : { "@delete" : "george", "@insert" : "georgi" }}
]

Here, regardless of the shape of the original document, the final document will be the same regardless of the order of patch application.

Type Families

The type families present us with the most difficult aspects of patch application.

Optional

Optional where the original value does not exist, or the final value is deleted can be specified using an explicit null field. Here is an example in which we merely add the value "georgi" to "name" but expect "name" to be empty when the patch is applied.

{ "@id" : "Person/george"
  "name" : { "@delete" : null, "@insert" : "georgi" }}

Set

Since sets are collections without order, I would propose that the vanilla diff for sets should be composed as the elements which are inserted as a null operation, those that are updated as a change operation between the two, and those that are not altered as implicit (and not explicitly specified in the operation).

We will imagine that we have a task set, of subdocument objects.

let before =  
{ "@id" : "Person/george",
  "name" : "george",
  "address" : "10 Abigail Lane",
  "tasks" [ { "@id" "Person/george/tasks/Task/laundry",
              "priority" : 5, 
              "name" : "laundry" },
            { "@id" : "Person/george/tasks/Task/rubbish",
              "priority" : 3, 
              "name" : "rubbish" },
            { "@id" "Person/george/tasks/Task/dinner",
              "priority" : 1,
              "name" : "dinner" },
          ]
}

let patch = 
{ "@id" : "Person/george", 
  "tasks" : [ { "@delete" : { "@id" : "Person/george/tasks/Task/dinner" }, 
                "@insert" : null },
              { "@id" : "Person/george/tasks/Task/rubbish",
                "priority" : { "@delete" : 3, "@insert" : 12 }}
            ]}

let after = apply_patch(before, patch)
assert(
  after == 
  { "@id" : "Person/george",
    "name" : "george",
    "address" : "10 Abigail Lane",
    "tasks" [ { "@id" "Person/george/tasks/Task/laundry",
                "priority" : 5, 
                "name" : "laundry" },
              { "@id" : "Person/george/tasks/Task/rubbish",
                "priority" : 12, 
                "name" : "rubbish" }
            ]
 })

List

Lists have order and as such we probably want the vanilla list to be able to insert a list from an index, or delete a list from an index.

Supposing our task list is of type family List.

let before =  
{ "@id" : "Person/george",
  "name" : "george",
  "address" : "10 Abigail Lane",
  "tasks" [ { "@id" "Person/george/tasks/Task/laundry",
              "name" : "laundry" },
            { "@id" : "Person/george/tasks/Task/rubbish",
              "name" : "rubbish" },
            { "@id" "Person/george/tasks/Task/dinner",
              "name" : "dinner" },
          ]
}

let patch = 
{ "@id" : "Person/george", 
  "tasks" : [{ "@insert" : [
                    { "@id" : "Person/george/tasks/Task/saxophone",
                      "name" : "saxophone" },
                    { "@id" : "Person/george/tasks/Task/civilisation",
                      "name" : "civilisation" }
               ],
               "@delete" : [
                    { "@id" : "Person/george/tasks/Task/rubbish",
                      "name" : "rubbish" }
               ], 
               "@after" : 1 }
            ]}

let after = apply_patch(before, patch)
assert(
  after == 
  { "@id" : "Person/george",
    "name" : "george",
    "address" : "10 Abigail Lane",
    "tasks" [ { "@id" "Person/george/tasks/Task/laundry",
                "name" : "laundry" },
              { "@id" : "Person/george/tasks/Task/saxophone",
                "name" : "saxophone" },
              { "@id" : "Person/george/tasks/Task/civilisation",
                "name" : "civilisation" }
              { "@id" "Person/george/tasks/Task/dinner",
                "name" : "dinner" },
            ]
 })

Notice the patch has a list for the "tasks" property. This is because the patches may need to be inserted at multiple points. This also allows a "read" state to be added by explicit mention in the appropriate positions in the list.

This is flexible - and can be made explicitly "tight" or explicitly "loose" which I think enables many different patches to be produced which apply identically when successful, but which have more or less restrictive domains.

Array

... no idea ...

Patch Synthesis

Because of past choices, we do not store information such that patches are readily apparent to us. Instead we have a flat plane of added triples and a flat plane of removed triples. However, we would really like to be able to take these individual commit planes and synthesise patches from them.

We will need this when we have a commit which has already been applied, and we want to see whether we can commute our patch for an object with this plane.

The above patch would look like:

-t("object/my_obj","name","george")
-t("object/my_obj","status","active")
+t("object/my_obj","name","georgi")
+t("object/my_obj","status","inactive")

Of course this version is relatively simple, but with sub-objects and the families: set / list / array, patch synthesis will be more complex.

Diff

The heart of change is diff. If you want to know what is different between now and some other time, diff can tell you. But the patches that you construct from a diff are not entirely straight-forward. Basically you want to find a minimal set of changes to transform one object into another and what is minimal requires some sort of objective function. This involves semantics.
Ideally we would provide a general and useful diff, which could be customised by specifying semantic flags at the types. Type directed diff which could value different kinds of changes differently would make for a very flexible approach which can be extended as we find different approaches necessary.

To give a brief example, the idea of a table diff is much different than a simple list diff. A table diff tries to see if shifting of rows or columns can be achieved to minimise the cost function, only after which we might look for individual cell level changes.

[[1,3],
 [4,6]]

[[1,2,3],
 [4,5,6]]

The first can be transformed into the second by deletion of 3@2, and insertion of a 2@2 and 3@3, as well as a deletion of a 6@2 and insertion of a 5@2 and 6@3. However what we actually want here is the insertion of a new column - so we should shift everything to the right, rather than mention the final column explicitly at all in the change. The later is a table diff, the former is a cell diff. It is impossible to know in general what you actually want here without more information.

Transaction Semantics

As mentioned above, patches can also help us to perform safe document transactions. If we know what state we expect to have when we perform an operation, we can know whether it is safe to perform it - we can also know if it is safe to reperform it in the case of another transaction winning the race to commit.

Right now we do not have a good snapshot isolation approach to documents because we do not return the commit id in the headers, and we can not submit the commit id at update time - but this is a fairly straightforward change to the interface which we can (and should) make soon.

But even without this, a more flexible interface can be exposed by simply submitting the patch that you want to apply.

Further, these patches could be quite convenient in that you do not have to express every aspect of the document which you want to update - only what you think is there for a given portion of the document when you perform the update.

We could even have a wild-card type patch which allowed you to submit an "I don't care" type patch for what exists presently and what you want to see in the final state. This could be very convenient for trying to force the state of an object which you are unconcerned will be clobbered (perhaps because the state only monotonically approaches some value - like a finalized state which will not be unfinalized by any system operation).

The Development Plan

Patch application
Vanilla Diff, keeping in mind that later we might want the objective function to be type directed.
Patch synthesis
Rework Rebase
Semantically differentiated Diff

bradjonesca · 2021-10-15T19:46:03Z

bradjonesca
Oct 15, 2021

Just published today: https://bartoszsypytkowski.com/yrs-architecture/

0 replies

GavinMendelGleason · 2021-10-16T18:50:08Z

GavinMendelGleason
Oct 16, 2021
Maintainer Author

Very interesting reading @bradjonesca

0 replies

fidelthomet · 2021-10-18T11:48:07Z

fidelthomet
Oct 18, 2021

This looks super promising! I'm wondering about cases where I don't (or don't want to) know the current state of the data though. I do realise that this goes against some of the points you make above. I still think that there are use cases and that if this were to be supported, it would make sense to use the same interface for that. (But feel free to disregard this comment if you think I'm completely missing the point) 🙃

Let's say I've some sensor data where I'm only interested in storing the latest available data. This could be achieved simply by omitting @delete for the default and optional type families:

{
  "@id" : "TemperatureSensor/Room111"
  "measurement" : { "@insert" : 19.4 }
}

In the case of sets omitting @delete could allow to add values without deleting others (since setting @delete: null would probably mean that I expect there to not be any measurements yet). However, I'm not sure how one could remove all measurements or replace them fully with a new set.

{
  "@id" : "TemperatureSensor/Room111"
  "measurements" : [{ "@insert" : 19.4 }, { "@insert" : 21.7 }]
}

For lists I'm imagining something where I want to store my 10 latest measurements. For that one would probably need to be able to delete at a specific index:

{ 
  "@id" : "TemperatureSensor/Room111", 
  "measurements" : [{ 
    "@insert" : {
      "@value" : 19.4,
      "@after": 0
    },
    "@delete" : {
      "@at": 10
    }
  }]
}

0 replies

GavinMendelGleason · 2021-10-18T11:55:47Z

GavinMendelGleason
Oct 18, 2021
Maintainer Author

Indeed the "don't care" update was what I was thinking of in the final paragraph of the above, and you've even given the same syntax that we've been discussing informally!

The list one is very interesting and will require some playing around with, but definitely being able to remove an index seems like a useful operation.

1 reply

fidelthomet Oct 18, 2021

Ah, sorry. I should've read that more thoroughly!

GavinMendelGleason · 2022-03-10T10:58:12Z

GavinMendelGleason
Mar 10, 2022
Maintainer Author

In order to make this practicable with the current code base we need to be able to synthesis the IDs of all objects modified by a given delta.

[ ] At first, we will merely create a predicate that retrieves the parent ID of the enclosing document of any triple. Later we may record this information directly.

[ ] After obtaining all ids changed in a given delta stack, then we can do document retrieval and use diff to synthesise the patch. For all changed objects.

[ ] At this point, application will be this patch, plus a resource, so that will require a new endpoint (or an extended version of the same) that takes a resource as an argument.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TerminusDB

Patch Application #686

{{title}}

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

TerminusDB

Patch Application #686

GavinMendelGleason Oct 14, 2021 Maintainer

Patch

Type Families

Patch Synthesis

Diff

Transaction Semantics

The Development Plan

Replies: 5 comments · 1 reply

bradjonesca Oct 15, 2021

GavinMendelGleason Oct 16, 2021 Maintainer Author

fidelthomet Oct 18, 2021

GavinMendelGleason Oct 18, 2021 Maintainer Author

fidelthomet Oct 18, 2021

GavinMendelGleason Mar 10, 2022 Maintainer Author

GavinMendelGleason
Oct 14, 2021
Maintainer

Replies: 5 comments 1 reply

bradjonesca
Oct 15, 2021

GavinMendelGleason
Oct 16, 2021
Maintainer Author

fidelthomet
Oct 18, 2021

GavinMendelGleason
Oct 18, 2021
Maintainer Author

GavinMendelGleason
Mar 10, 2022
Maintainer Author