Document what to do with duplicate items #33

Nemo157 · 2016-01-21T10:39:08Z

I can probably work out what to do by checking the code, but it would be good for this to be defined as part of the format. An example is for the review on fd4109d, this has reviews stored under note e81a4e0ea017 and CI status under note 4914d5adea78. Using git show to view the notes directly gives

Review

> git show e81a4e0ea017
{"timestamp":"1450125974","targetRef":"refs/heads/master","reviewRef":"refs/pull/5/head","requester":"stp-ip","description":"Fix typo in README"}
{"timestamp":"1450215328","targetRef":"refs/heads/master","reviewRef":"refs/pull/5/head","requester":"stp-ip","description":"Fix typo in README"}
{"timestamp":"1450219553","targetRef":"refs/heads/master","reviewRef":"refs/pull/5/head","requester":"stp-ip","description":"Fix typo in README"}
{"timestamp":"1450222101","targetRef":"refs/heads/master","reviewRef":"refs/pull/5/head","requester":"stp-ip","description":"Fix typo in README"}
{"timestamp":"1450338870","targetRef":"refs/heads/master","reviewRef":"refs/pull/5/head","requester":"stp-ip","description":"Fix typo in README"}
{"timestamp":"1450338936","targetRef":"refs/heads/master","reviewRef":"refs/pull/5/head","requester":"stp-ip","description":"Fix typo in README"}

{"timestamp":"1450338936","reviewRef":"refs/pull/5/head","targetRef":"refs/heads/master","requester":"stp-ip","description":"Fix typo in README","baseCommit":"6302bf385c36958fb8881e43f11daeeb4ed42cc0"}

For this I assume just taking the latest instance would be appropriate.

Although how to determine latest? Judging by the duplicates on the last two entries timestamp is not guaranteed to be monotonically increasing, and even if it were on a single machine then what happens with edits to the same review from multiple machines?

CI Status

> git show 4914d5adea78
{"timestamp":"1450504163","v":0,"agent":"Jenkins(1.627) GitNotesJobLogger","url":"https://jenkins-dot-developer-tools-bundle.appspot.com/job/git-appraise/118/"}

{"timestamp":"1450504169","v":0,"agent":"Jenkins(1.627) GitNotesJobLogger","url":"https://jenkins-dot-developer-tools-bundle.appspot.com/job/git-appraise/118/","status":"success"}

{"timestamp":"1450219554","status":"success","agent":"cla/google"}

{"timestamp":"1450126113","url":"https://travis-ci.org/google/git-appraise/builds/96828980","status":"success","agent":"continuous-integration/travis-ci/pr"}

{"timestamp":"1450126043","url":"https://travis-ci.org/google/git-appraise/builds/96828980","agent":"continuous-integration/travis-ci/pr"}

{"timestamp":"1450125975","url":"https://travis-ci.org/google/git-appraise/builds/96828980","agent":"continuous-integration/travis-ci/pr"}

Since there are multiple agents submitting their CI status I guess the latest entry from each unique url (or agent when url is missing?) value may be appropriate.

Here you can actually see that different machines have different timestamps. In this case it doesn't really matter as each agent is on a unique machine so that when partitioned by url or agent the timestamp is at least increasing, even if not guaranteed to be monotonic. But I can imagine some form of CI agent that involves passing the job through multiple machines as it works, which could result in non-increasing timestamps for a single job.

It might be worth adding a required third-party id entry to allow agents without a url (like cla/google) to add multiple statuses and update them?

Also just noticed that it appears the travis-ci implementation is actually prepending its new statuses as it works rather than appending like the rest seem to.

These are the only parts of the spec that I've really looked into so far. A quick glance at robot comments seems like it should be ok since there's nothing there to update, I'll take a deeper look at review comments and add an update.

The text was updated successfully, but these errors were encountered:

Nemo157 · 2016-01-21T11:00:31Z

Continuing on from above with the review comments, for commit fd4109d the review comments are in note 0328cd2a389f.

Review Comments

> git show 0328cd2a389f
{"timestamp":"1450215328","author":"ojarjur","Parent":"","description":"Hi, thanks for the pull request. Before we merge it, can you please go to https://cla.developers.google.com/ and sign the Google contributor license agreement?"}
{"timestamp":"1450219553","author":"stp-ip","Parent":"03806b21bf729953002d62570d43d46f5585e7a5","description":"Done. Thought cla was only needed for 1+ line changes so."}
{"timestamp":"1450221628","author":"[email protected]","location":{"commit":"fd4109dbd9e1c239d8dde559bad523c6afeed5fb"},"resolved":true}
{"timestamp":"1450338870","author":"lealon2014","location":{"commit":"fd4109dbd9e1c239d8dde559bad523c6afeed5fb","path":"README.md","range":{"startLine":95}},"Parent":"","description":"283422"}
{"timestamp":"1450338936","author":"lealon2014","Parent":"55e401f3629fae6900d14f1f3f45413a96d1dec6","description":"That's that"}

It appears there's no way to update review comments other than replying with a new message? I guess that's ok, I just like being able to go back and fix spelling mistakes when needed. So again there's no issue with duplicates, they're all just individual items.

Also, the docs say "When the parent is specified, it must be the SHA1 hash of another comment on the same revision". I assume the second comment is replying to the first so I tried throwing the first comment into shasum a few different ways but wasn't able to get 03806b21bf729953002d62570d43d46f5585e7a5 out. I feel that parsing and re-serializing the comments before generating the hash could cause interoperability issues, it may be better to instead hash the exact text from the note.

Nemo157 · 2016-01-21T11:10:01Z

Just took a look at the actual source and it seems that for the review request it just takes the last written (in the file, ignoring the timestamp values) request, while for the CI statuses it checks the timestamps and only uses the latest one for the build status message.

ojarjur · 2016-01-21T17:48:54Z

Everything you've said seems spot-on, but in particular the comment about us needing to explicitly document this stuff in the spec is absolutely right.

I'll take a pass at the spec later today and try to make things more clear, but it is easy for me to overlook something being under-specified, so please take another look after I do that.

As for the specific details:

Review requests: Yes, (currently) the last one in the file is taken.

Normally, since we always write the timestamp first, and notes are either appended to the end or merged using cat_sort_uniq, this final line will also be the one with the latest timestamp.

However, that isn't very reliable, so I think we should change it to explicitly sort them in-memory and take the one with the latest timestamp.

The particular requests you saw with identical timestamps seem to be a side effect of a recent change I made to our script that mirrors pull requests into git-appraise. I don't expect this to happen very often, but we do need a solid (official) story for how that should be interpreted. I'll think about it some more.
2. Comments: Yes, these are effectively append only.

Think of this as the equivalent of sending emails to a pull request in a mailing list; you can't unsend email, so not being able to edit a comment after the fact seemed reasonable. That being said, it's just stored in git history, and that is mutable. I could see us adding a command to update a comment that hasn't been pushed to a remote yet, but once you push it to a remote, then that seems too late to undo (e.g. what if someone posts another comment responding to your typo?).
3. Parent hashes: Yes, this needs to be the exact hash of the text in git-notes.

Parsing and reserializing won't work because the field ordering and even whitespace affect this.

CI statuses: For now we take the last, but really we should be incorporating all of them.

For instance, if there are reports that report both success and failure, then that means the CI tests are being flaky, and you should probably be able to see that in the review. For your web UI, this is probably easier to do, since you can have a drop down with the URLs for each test run. Regardless, we should definitely show the status for each agent (the CLI currently doesn't do that, and it should).

Thanks for looking into this and pointing out the ambiguities.

ojarjur · 2016-01-21T19:22:07Z

A follow up question, @Nemo157: I assume the context for this is your git-appraise-rs project... Do you intend to open source that? If so, could you add licensing information to the repo (e.g. a LICENSE or COPYING file)?

If you do, then would you like us to add a link to that project from the git-appraise README?

Nemo157 · 2016-01-21T19:31:29Z

@ojarjur Yes, yes and sure 😄

I'll split the repo as well into just the base library and the web interface, so if you link to the existing repo as just a rust client library for interacting with the git-appraise notes in a repository that would be great. I should get to that at some point tonight, I'll ping you once it's done.

Probably not worth linking to the web interface wherever that ends up, at least until I get it actually useful 😄.

Nemo157 · 2016-01-21T20:21:58Z

@ojarjur Done, https://github.com/Nemo157/git-appraise-rs now contains just the client library, dual licensed under MIT and Apache v2.

The web service has been moved to https://github.com/Nemo157/git-appraise-server in case you wanted to find it.

Previously, the tool was taking the last request in a note as being the current one, and relying on the combination of appending notes and merging them using "cat_sort_uniq" to make that correspond to the request with the latest timestamp. However, that was neither as robust as it could be, nor was it documented in our spec. This change makes the tool pick the last request by timestamp, and updates the spec to indicate that this is how the notes should be interpreted. This change also makes the sorting of both requests and comments stable, and extends the Summary struct to store all of the review requests in addition to storing the current one. This change is a (partial) response to feedback in #33.

Formalize how to handle multiple review requests on a commit. Previously, the tool was taking the last request in a note as being the current one, and relying on the combination of appending notes and merging them using "cat_sort_uniq" to make that correspond to the request with the latest timestamp. However, that was neither as robust as it could be, nor was it documented in our spec. This change makes the tool pick the last request by timestamp, and updates the spec to indicate that this is how the notes should be interpreted. This change also makes the sorting of both requests and comments stable, and extends the Summary struct to store all of the review requests in addition to storing the current one. This change is a (partial) response to feedback in #33.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document what to do with duplicate items #33

Document what to do with duplicate items #33

Nemo157 commented Jan 21, 2016

Nemo157 commented Jan 21, 2016

Nemo157 commented Jan 21, 2016

ojarjur commented Jan 21, 2016

ojarjur commented Jan 21, 2016

Nemo157 commented Jan 21, 2016

Nemo157 commented Jan 21, 2016