-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CEP-46 witnesses #4378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: cep-45-mutation-tracking
Are you sure you want to change the base?
CEP-46 witnesses #4378
Conversation
65dcc37
to
8faaca0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a lot of references to transient replication - are you planning to remove those later on?
Could include a comment in NEWS that this breaks any existing users of the experimental transient replication features. Those users shouldn't have any expectation of feature stability, but a note in NEWS would be a kind gesture.
// TODO (review): This can be removed right? ReadRepair is effectively not done anymore so the setting doesn't matter | ||
// if (keyspace.replicationStrategy.hasTransientReplicas() | ||
// && table.params.readRepair != ReadRepairStrategy.NONE) | ||
// { | ||
// throw ire("read_repair must be set to 'NONE' for transiently replicated keyspaces"); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should require BLOCKING for tracked keyspaces for now. Users could previously switch to NONE if they preferred write atomicity over read monotonicity and wanted slightly better performance, but with mutation tracking there wouldn't be a semantics tradeoff, and the performance of read reconciliation should be even better than read repair anyway.
The reason I think we should defer support for NONE is that we may decide to do an equivalent of full repair with range reads, and there isn't a strong justification to support both modes yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to punt on this for now. Requiring things makes it blow up on existing schema when maybe it doesn't need to? Makes it harder to enable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I support moving forward with that option essentially ignored for now. We could also drop a client warning if a user has mutation tracking enabled for a keyspace with read repair NONE.
@@ -107,6 +108,7 @@ else if (localComparisonEpoch.isAfter(readCommand.serializedAtEpoch())) | |||
|
|||
private ClusterMetadata checkTokenOwnership(ClusterMetadata metadata, Message<T> message) | |||
{ | |||
boolean acceptsTransient = message.verb() != Verb.READ_REQ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming this will include RANGE_REQ later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing reads from transient replicas via this path and verb handler now since Blake reworked summary requests to be their own verb. I'll update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the summary requests go through here so I will use summary request verb to check.
@@ -1430,7 +1393,7 @@ private void serializeHeader(ReadCommand command, DataOutputPlus out, int versio | |||
out.writeByte( | |||
digestFlag(command.isDigestQuery()) | |||
| indexFlag(null != command.indexQueryPlan()) | |||
| acceptsTransientFlag(command.acceptsTransient()) | |||
// | acceptsTransientFlag(false) Deprecated flag, could be reused? | |||
| needsReconciliationFlag(command.rowFilter().needsReconciliation()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needsReconciliationFlag
was added for non-strict filtering, should probably rename for clarity now that read reconciliation is its own concept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this in scope for this change set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just a thought while I was reviewing
@@ -1388,7 +1371,7 @@ public CleanupSummary releaseRepairData(Collection<TimeUUID> sessions) | |||
readLock.lock(); | |||
try | |||
{ | |||
for (PendingRepairManager prm : Iterables.concat(pendingRepairs.getManagers(), transientRepairs.getManagers())) | |||
for (PendingRepairManager prm : Iterables.concat(pendingRepairs.getManagers())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to concat anymore
// TODO (review): This should be removed right? Nothing depends on it? | ||
// if (stats != null) | ||
// field("IsTransient", stats.isTransient); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support removal 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is poorly written tools can see the field missing and start erroring out. I'll put it back in for now.
// TODO (review): Cheap quorums are not a goal for now, would really require speculation for writes | ||
// and would trigger reconciliation work later, potentially quite a lot so I don't think it makes sense. | ||
// So just remove selector? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support removing support for cheap quorums. I could imagine a world where we deprioritize delivery of new writes to witnesses if reconciliation is behind, but even a change like that would require more justification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave this in for now to avoid disruption.
// TODO (review): Races and this being an out of range mutation, it's silently accepting the mutation to the log | ||
// but not applying to the CF. This was already checked, but in a race what happens? Double check ack handling | ||
// and whether the voting group is checked to match safely | ||
for (Range<Token> r : localRanges.onlyFull().ranges()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're losing the range (coordinator is behind and believes replica owns the range), we may write something to an SSTable that's not read, and will be purged on next cleanup.
If we're gaining the range (coordinator is ahead and believes replica owns the range), TCM should try to catch up before we process the message and get here.
…it is no longer possible to have transient sstables.
…thMutationTrackingTest
c32e7e0
to
326fbf1
Compare
I have a separate PR for the rename #4174 I just haven't had a chance to really push that through and I was also wondering if we should merge the rename back to 4.0 to ease merge pain in the long run. |
This is some initial cleanup removing a lot of references to transient replication from a lot of the sstable plumbing. I didn't remove the field from the format metadata yet although we should do that as part of the next format change.
It also has the few tweaks needed to get single partition reads and writes working.
Known bugs:
Needs fixing now
Could be fixed later