-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automerge #30
Comments
This makes me think of git rebasing: voluntarily applying your own
changes on top of newly received changes in order to avoid a merge. I
think it's a good approach.
One difference in how we do sync compared to git is that git has
separate pull vs push phases. Imagine a case with a merge conflict with
a two-way sync: both sides would try and resolve the conflict
simultaneously -- and, likely, differently -- resulting in further
conflict. Git puts the onus of conflict resolution on the puller, so
that they can resolve it w.r.t. the other party and push, producing a
conflict-free result for both parties. I think this works very well,
speaking from personal experience!
In some ways this actually sounds really close to how USB sync works in
the field already: Mapeo pulls from the USB drive, and, later, pushes
new changes back. One slight tweak to be made is to call `replicate()`
as `replicate({ method: 'pull' })` on Mapeo so that our
pre-conflict-resolution changes are not pushed to the USB drive yet.
From there, we can do local conflict resolution (or auto-merging, as
fortune permits), and THEN do a `replicate({ method: 'push' })` back to
the USB drive.
We'll need to modify network sync to be pull-based too, to reap the same
benefits.
|
Another thing auto-merge will need to do is discover new referers from
other forks and update them to point to our latest HEADs. Here's an
example:
1. Log A and Log B have WAY_v1
2. Log A moves a node, requiring WAY_v1 to become WAY_v2
3. Log B creates a relation, with WAY_v1 as a member
4. Log A <--SYNC--> Log B
5. The relation points at WAY_v1, but WAY_v2 is the actual HEAD
Now that I think about it, this should only ever happen with relations,
since you can't create a node and then add a way to it later (right?).
|
I think we're talking about slightly different approaches. What I am suggesting with "auto-merging" is presentational - there are two forks, but they are presented on the client as a single merged fork. It is only when the client edits the relation / member of the relation that they create a new doc that points to the previous forks - an actual merge. I wonder how quickly this could get out of control if everybody is editing different sections of a large relation? Regarding the second issue you mention, I think there are several ways that a relation could point to members which are not the head, and ways can point to nodes that are not the head. It is possible to create a node and add a way later. Perhaps one way to "auto-merge" these is to create a "virtual fork" that is presented to the user. In this case it would be a second relation_v2 that points to Way_v2. |
On 02/23 16:06, Gregor MacLennan wrote:
I think we're talking about slightly different approaches. What I am
suggesting with "auto-merging" is presentational - there are two
forks, but they are presented on the client as a single merged fork.
I think I see. As part of the "deforking" process? The comma-separated
version would masquerade as its own unique document version?
osm-p2p-server will need to be smart enough to intercept any requests
made against such faux documents, since osm-p2p-db doesn't understand
comma-separated versions. I worry about fragility with this approach, in
the disparity between presentation and data model.
Having the merge be explicit has its own set of challenges, but it will
be a bit more reliable / consistent, since the presentation layer
doesn't receive as strong of a reinterpretation of the data.
It is only when the client edits the relation / member of the relation
that they create a new doc that points to the previous forks - an
actual merge. I wonder how quickly this could get out of control if
everybody is editing different sections of a large relation?
What do you mean here by "out of control"? In terms of extra data
created by rippling changes? By conflicts?
Regarding the second issue you mention, I think there are several ways
that a relation could point to members which are not the head, and
ways can point to nodes that are not the head. It is possible to
create a node and add a way later. Perhaps one way to "auto-merge"
these is to create a "virtual fork" that is presented to the user. In
this case it would be a second relation_v2 that points to Way_v2.
I wonder if there's a way to present the data naturally and honestly to
the users without it being terribly confusing. Git accomplishes this by
decent auto-merging and by virtue of its users being technical people
already, but I wonder if we can have less "magic" between the forking
data model and what users actually interact with?
|
It has done this since the first version, we've just never used it: https://github.com/digidem/osm-p2p-server/blob/master/api/put_changes.js#L82-L84 - it was designed as the way to represent the links array in XML. The nice thing about this technique is that iD editor just treats it as a regular version number and it just gets passed through - iD never touches the version number. It's when osm-p2p-server gets it back that it knows that this "version id" is actually an array of two version numbers, comma-separated.
In a scenario where multiple users are continuously editing different (mergeable) segments of a long river relation, would new forks be created by merges faster than they are being merged?
I'm not sure, I think it is always going to be a hard thing for users to understand. My current thinking is that it's best to present a single version to the user, based on something like modification time and a deforking step, but also give a visual indication that more than one version exists, and a UI that can display the DAG - that is where we need the UX work to make that clear and understandable to the user who is actually interested in reviewing forks and resolving/merging - I think it is a subset of users who even need to know about this. |
The most common cause of forked documents is when two users edit a way. This is common for long rivers that cross a large area. We have observed that users are very tempted to fix rivers to adjust the alignment or "round" a sharp corner. Two users working in different areas could easily adjust parts of the same river.
One way around this is to create rivers as a relation of multiple shorter ways. This is the recommended way for representing large ways or areas in OSM. If two users edit two different way sections of a river, the relation itself does not change, and no forks are created. This behaviour would be broken however by digidem/osm-p2p-db#49 which would modify the version of a relation for every edit of its members, and this change in behaviour is likely necessary for fixing bugs related to replication, deletions and forks.
A solution to this problem if digidem/osm-p2p-db#49 is implemented is to do "auto-merging" in osm-p2p-server or osm-p2p-api. Auto-merging would work by:
opts.links
that is passed to hyperlog/hyperkv.The text was updated successfully, but these errors were encountered: