Automerge #30

gmaclennan · 2017-02-23T04:11:30Z

The most common cause of forked documents is when two users edit a way. This is common for long rivers that cross a large area. We have observed that users are very tempted to fix rivers to adjust the alignment or "round" a sharp corner. Two users working in different areas could easily adjust parts of the same river.

One way around this is to create rivers as a relation of multiple shorter ways. This is the recommended way for representing large ways or areas in OSM. If two users edit two different way sections of a river, the relation itself does not change, and no forks are created. This behaviour would be broken however by digidem/osm-p2p-db#49 which would modify the version of a relation for every edit of its members, and this change in behaviour is likely necessary for fixing bugs related to replication, deletions and forks.

A solution to this problem if digidem/osm-p2p-db#49 is implemented is to do "auto-merging" in osm-p2p-server or osm-p2p-api. Auto-merging would work by:

Is a relation forked?
If yes, get the common parent and compare the versions of members.
If member versions have only changed on one of the forks (comparing to the common parent) then the relation can be auto-merged by including the most recent version of each of its members.
a forked relation where each fork has a different change to the same member would not be able to be auto-merged.
The fork can be presented to the client (i.e. iD editor) as a single unforked relation with a special version number that refers to the forks that were merged. One way of doing this would be to set the version number as a comma-separated list of versions of the merged relations.
On submission, iD includes the version IDs that were edited, in this case a comma-separated list.
osm-p2p-server can use the comma-separated version id to set opts.links that is passed to hyperlog/hyperkv.
after editing there is now only one head (forks are actually merged)

The text was updated successfully, but these errors were encountered:

hackergrrl · 2017-02-23T22:55:46Z

This makes me think of git rebasing: voluntarily applying your own changes on top of newly received changes in order to avoid a merge. I think it's a good approach. One difference in how we do sync compared to git is that git has separate pull vs push phases. Imagine a case with a merge conflict with a two-way sync: both sides would try and resolve the conflict simultaneously -- and, likely, differently -- resulting in further conflict. Git puts the onus of conflict resolution on the puller, so that they can resolve it w.r.t. the other party and push, producing a conflict-free result for both parties. I think this works very well, speaking from personal experience! In some ways this actually sounds really close to how USB sync works in the field already: Mapeo pulls from the USB drive, and, later, pushes new changes back. One slight tweak to be made is to call `replicate()` as `replicate({ method: 'pull' })` on Mapeo so that our pre-conflict-resolution changes are not pushed to the USB drive yet. From there, we can do local conflict resolution (or auto-merging, as fortune permits), and THEN do a `replicate({ method: 'push' })` back to the USB drive. We'll need to modify network sync to be pull-based too, to reap the same benefits.

hackergrrl · 2017-02-23T23:39:27Z

Another thing auto-merge will need to do is discover new referers from other forks and update them to point to our latest HEADs. Here's an example: 1. Log A and Log B have WAY_v1 2. Log A moves a node, requiring WAY_v1 to become WAY_v2 3. Log B creates a relation, with WAY_v1 as a member 4. Log A <--SYNC--> Log B 5. The relation points at WAY_v1, but WAY_v2 is the actual HEAD Now that I think about it, this should only ever happen with relations, since you can't create a node and then add a way to it later (right?).

gmaclennan · 2017-02-24T00:06:05Z

I think we're talking about slightly different approaches. What I am suggesting with "auto-merging" is presentational - there are two forks, but they are presented on the client as a single merged fork. It is only when the client edits the relation / member of the relation that they create a new doc that points to the previous forks - an actual merge. I wonder how quickly this could get out of control if everybody is editing different sections of a large relation?

Regarding the second issue you mention, I think there are several ways that a relation could point to members which are not the head, and ways can point to nodes that are not the head. It is possible to create a node and add a way later. Perhaps one way to "auto-merge" these is to create a "virtual fork" that is presented to the user. In this case it would be a second relation_v2 that points to Way_v2.

hackergrrl · 2017-02-24T00:37:24Z

On 02/23 16:06, Gregor MacLennan wrote: I think we're talking about slightly different approaches. What I am suggesting with "auto-merging" is presentational - there are two forks, but they are presented on the client as a single merged fork.

I think I see. As part of the "deforking" process? The comma-separated version would masquerade as its own unique document version? osm-p2p-server will need to be smart enough to intercept any requests made against such faux documents, since osm-p2p-db doesn't understand comma-separated versions. I worry about fragility with this approach, in the disparity between presentation and data model. Having the merge be explicit has its own set of challenges, but it will be a bit more reliable / consistent, since the presentation layer doesn't receive as strong of a reinterpretation of the data.

It is only when the client edits the relation / member of the relation that they create a new doc that points to the previous forks - an actual merge. I wonder how quickly this could get out of control if everybody is editing different sections of a large relation?

What do you mean here by "out of control"? In terms of extra data created by rippling changes? By conflicts?

Regarding the second issue you mention, I think there are several ways that a relation could point to members which are not the head, and ways can point to nodes that are not the head. It is possible to create a node and add a way later. Perhaps one way to "auto-merge" these is to create a "virtual fork" that is presented to the user. In this case it would be a second relation_v2 that points to Way_v2.

I wonder if there's a way to present the data naturally and honestly to the users without it being terribly confusing. Git accomplishes this by decent auto-merging and by virtue of its users being technical people already, but I wonder if we can have less "magic" between the forking data model and what users actually interact with?

gmaclennan · 2017-02-24T01:01:48Z

osm-p2p-server will need to be smart enough to intercept any requests
made against such faux documents

It has done this since the first version, we've just never used it: https://github.com/digidem/osm-p2p-server/blob/master/api/put_changes.js#L82-L84 - it was designed as the way to represent the links array in XML. The nice thing about this technique is that iD editor just treats it as a regular version number and it just gets passed through - iD never touches the version number. It's when osm-p2p-server gets it back that it knows that this "version id" is actually an array of two version numbers, comma-separated.

What do you mean here by "out of control"? In terms of extra data
created by rippling changes? By conflicts?

In a scenario where multiple users are continuously editing different (mergeable) segments of a long river relation, would new forks be created by merges faster than they are being merged?

but I wonder if we can have less "magic" between the forking
data model and what users actually interact with?

I'm not sure, I think it is always going to be a hard thing for users to understand. My current thinking is that it's best to present a single version to the user, based on something like modification time and a deforking step, but also give a visual indication that more than one version exists, and a UI that can display the DAG - that is where we need the UX work to make that clear and understandable to the user who is actually interested in reviewing forks and resolving/merging - I think it is a subset of users who even need to know about this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automerge #30

Automerge #30

gmaclennan commented Feb 23, 2017

hackergrrl commented Feb 23, 2017 via email

hackergrrl commented Feb 23, 2017 via email

gmaclennan commented Feb 24, 2017

hackergrrl commented Feb 24, 2017 via email

gmaclennan commented Feb 24, 2017

Automerge #30

Automerge #30

Comments

gmaclennan commented Feb 23, 2017

hackergrrl commented Feb 23, 2017 via email

hackergrrl commented Feb 23, 2017 via email

gmaclennan commented Feb 24, 2017

hackergrrl commented Feb 24, 2017 via email

gmaclennan commented Feb 24, 2017