-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
service: use read barrier in tablet_virtual_task::contains #21995
base: master
Are you sure you want to change the base?
Conversation
🔴 CI State: FAILURE✅ - Build Failed Tests (1/540):Build Details:
|
async def del_repair_task(): | ||
tablet_task_id = empty_id | ||
while tablet_task_id == empty_id: | ||
tablet_task_id = await get_tablet_task_id(cql, hosts[0], table_id, token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_tablet_task_id returns None or a valid task id. Why do we compare with 00000000-0000-0000-0000-000000000000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Looks like I broke the assumption. Fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, in tablet_map_to_mutation, repair's and migration's tablet_task_info is always set. Do not set the tablet_task_info if there is no running operation.
Currently, when the tablet repair is started, info regarding the operation is kept in the system.tablets. The new tablet states are reflected in memory after load_topology_state is called. Before that, the data in the table and the memory aren't consistent. To check the supported operations, tablet_virtual_task uses in-memory tablet_metadata. Hence, it may not see the operation, even though its info is already kept in system.tablets table. Run read barrier in tablet_virtual_task::contains to ensure it will see the latest data. Add a test to check it. Fixes: scylladb#21975.
d89078b
to
cf23a74
Compare
|
🔴 CI State: FAILURE✅ - Build Failed Tests (1/630):Build Details:
|
@@ -50,6 +51,8 @@ static bool tablet_id_provided(const locator::tablet_task_type& task_type) { | |||
} | |||
|
|||
future<std::optional<tasks::virtual_task_hint>> tablet_virtual_task::contains(tasks::task_id task_id) const { | |||
co_await _ss._migration_manager.local().get_group0_barrier().trigger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbr-scylla can you please confirm/deny that this solves the problem described in the patch notes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to say without understanding the context.
What are the causality dependencies here? From the description I'm guessing that:
- something guarantees that the
task_id
we're looking for is already insystem.tablets
(what is that something?) in the context of this PR - but that same thing does not guarantee that it's already in the in-memory data structures that are built from
system.tablets
when we reload the table (so the table was not reloaded)
if first point holds, then read barrier will make sure that whatever was visible in system.tablets is also visible in the in-memory state, but note that we only are guaranteed to see at least as recent state, i.e. the state we see could be e.g. after the data was removed (as we're basically sleeping for a potentially long time) and then we won't see the data, and then I don't know if it's guaranteed that the data is still there in the context we're discussing in, so dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something guarantees that the task_id we're looking for is already in system.tablets (what is that something?) in the context of this PR
This something is an API request for starting a new repair. This will create an entry in group0, which gets to system.tablets
, but not immediately visible in memory, because the topology object is regenerated first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario which prompted this PR is:
- user starts repair via API
- immediately after previous request returned, user checks state of repair task
Before this PR, user would get an error that repair task with given ID doesn't exist. Upon retry, they would see the expected task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will create an entry in group0, which gets to system.tablets, but not immediately visible in memory
How does it create the entry? If it uses raft_group0_client::add_entry
API to commit the entry, this underneath calls raft::server::add_entry
with wait_type::applied
, meaning that the future will resolve only after the entry is completely applied locally, which includes any in-memory changes that application of the entry performs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it uses raft_group0_client::add_entry API to commit the entry, this underneath calls raft::server::add_entry with wait_type::applied, meaning that the future will resolve only after the entry is completely applied locally, which includes any in-memory changes that application of the entry performs.
So if the repair API calls add_entry
and waits for the future before returning to the user, then the bug must be somewhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the problem is that the first entry added to group 0 doesn't actually update system.tablets, but some kind of background process starts and a follow-up entry is then added (maybe on topology coordinator) but then this PR also won't solve the problem as the read_barrier might observe the state in-between those 2 entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the problem is that the first entry added to group 0 doesn't actually update system.tablets, but some kind of background process starts and a follow-up entry is then added (maybe on topology coordinator)
for example alter keyspace in tablets mode works this way: the node handling the request only adds a group 0 entry which updates system.topology with request info, which causes the topology coordinator to pick up the request and eventually push the schema change entry itself. So for the API to guarantee that user sees the schema change upon returning, it's not enough to wait for the first entry to become applied, it actually has to wait for the entire request to finish (including the entry added by coordinator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it create the entry? If it uses raft_group0_client::add_entry API to commit the entry, this underneath calls raft::server::add_entry with wait_type::applied, meaning that the future will resolve only after the entry is completely applied locally, which includes any in-memory changes that application of the entry performs.
The test that hit the bug starts the repair task and gets the id directly from system.tablets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starts the repair task
ok but does the API to start repair task, which is used by the test, wait for the group 0 entry to get applied (does it wait for the future returned by add_entry
to get resolved)? And then does the test wait for the API to return response? If yes, then by that moment both system.tables and in-memory data structures should already be updated.
I opened #22011 |
Currently, when the tablet repair is started, info regarding the operation is kept in the system.tablets. The new tablet states are reflected in memory after load_topology_state is called. Before that, the data in the table and the memory aren't consistent.
To check the supported operations, tablet_virtual_task uses in-memory tablet_metadata. Hence, it may not see the operation, even though its info is already kept in system.tablets table.
Run read barrier in tablet_virtual_task::contains to ensure it will see the latest data. Add a test to check it.
Fixes: #21975.
No backport needed, 6.2 does not contain tablet_virtual_task