service: use read barrier in tablet_virtual_task::contains #21995

Deexie · 2024-12-19T15:52:09Z

Currently, when the tablet repair is started, info regarding the operation is kept in the system.tablets. The new tablet states are reflected in memory after load_topology_state is called. Before that, the data in the table and the memory aren't consistent.

To check the supported operations, tablet_virtual_task uses in-memory tablet_metadata. Hence, it may not see the operation, even though its info is already kept in system.tablets table.

Run read barrier in tablet_virtual_task::contains to ensure it will see the latest data. Add a test to check it.

Fixes: #21975.

No backport needed, 6.2 does not contain tablet_virtual_task

scylladb-promoter · 2024-12-19T16:41:33Z

🔴 CI State: FAILURE

✅ - Build
❌ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 topology_tasks/test_tablet_tasks

Failed Tests (1/540):

test_tablet_repair_task.dev.9 🔍

Build Details:

Duration: 48 min
Builder: spider1.cloudius-systems.com

asias · 2024-12-20T00:54:39Z

test/topology_tasks/test_tablet_tasks.py

+    async def del_repair_task():
+        tablet_task_id = empty_id
+        while tablet_task_id == empty_id:
+            tablet_task_id = await get_tablet_task_id(cql, hosts[0], table_id, token)


get_tablet_task_id returns None or a valid task id. Why do we compare with 00000000-0000-0000-0000-000000000000?

Oh. Looks like I broke the assumption. Fixing it.

Currently, in tablet_map_to_mutation, repair's and migration's tablet_task_info is always set. Do not set the tablet_task_info if there is no running operation.

Currently, when the tablet repair is started, info regarding the operation is kept in the system.tablets. The new tablet states are reflected in memory after load_topology_state is called. Before that, the data in the table and the memory aren't consistent. To check the supported operations, tablet_virtual_task uses in-memory tablet_metadata. Hence, it may not see the operation, even though its info is already kept in system.tablets table. Run read barrier in tablet_virtual_task::contains to ensure it will see the latest data. Add a test to check it. Fixes: scylladb#21975.

Deexie · 2024-12-20T09:35:19Z

rebase onto replica: do not set tablet_task_info if it isn't valid #22005
change empty_id to None

scylladb-promoter · 2024-12-20T11:40:28Z

🔴 CI State: FAILURE

✅ - Build
❌ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 topology_tasks/test_tablet_tasks

Failed Tests (1/630):

test_tablet_repair_task.dev.2 🔍

Build Details:

Duration: 2 hr 4 min
Builder: i-096de4578581f57a6 (m5ad.8xlarge)

denesb · 2024-12-20T12:06:58Z

service/task_manager_module.cc

@@ -50,6 +51,8 @@ static bool tablet_id_provided(const locator::tablet_task_type& task_type) {
 }

 future<std::optional<tasks::virtual_task_hint>> tablet_virtual_task::contains(tasks::task_id task_id) const {
+    co_await _ss._migration_manager.local().get_group0_barrier().trigger();


@kbr-scylla can you please confirm/deny that this solves the problem described in the patch notes?

Hard to say without understanding the context.

What are the causality dependencies here? From the description I'm guessing that:

something guarantees that the task_id we're looking for is already in system.tablets (what is that something?) in the context of this PR

but that same thing does not guarantee that it's already in the in-memory data structures that are built from system.tablets when we reload the table (so the table was not reloaded)

if first point holds, then read barrier will make sure that whatever was visible in system.tablets is also visible in the in-memory state, but note that we only are guaranteed to see at least as recent state, i.e. the state we see could be e.g. after the data was removed (as we're basically sleeping for a potentially long time) and then we won't see the data, and then I don't know if it's guaranteed that the data is still there in the context we're discussing in, so dunno.

something guarantees that the task_id we're looking for is already in system.tablets (what is that something?) in the context of this PR

This something is an API request for starting a new repair. This will create an entry in group0, which gets to system.tablets, but not immediately visible in memory, because the topology object is regenerated first.

The scenario which prompted this PR is:

user starts repair via API

immediately after previous request returned, user checks state of repair task

Before this PR, user would get an error that repair task with given ID doesn't exist. Upon retry, they would see the expected task.

This will create an entry in group0, which gets to system.tablets, but not immediately visible in memory

How does it create the entry? If it uses raft_group0_client::add_entry API to commit the entry, this underneath calls raft::server::add_entry with wait_type::applied, meaning that the future will resolve only after the entry is completely applied locally, which includes any in-memory changes that application of the entry performs.

If it uses raft_group0_client::add_entry API to commit the entry, this underneath calls raft::server::add_entry with wait_type::applied, meaning that the future will resolve only after the entry is completely applied locally, which includes any in-memory changes that application of the entry performs.

So if the repair API calls add_entry and waits for the future before returning to the user, then the bug must be somewhere else

Unless the problem is that the first entry added to group 0 doesn't actually update system.tablets, but some kind of background process starts and a follow-up entry is then added (maybe on topology coordinator) but then this PR also won't solve the problem as the read_barrier might observe the state in-between those 2 entries.

Unless the problem is that the first entry added to group 0 doesn't actually update system.tablets, but some kind of background process starts and a follow-up entry is then added (maybe on topology coordinator)

for example alter keyspace in tablets mode works this way: the node handling the request only adds a group 0 entry which updates system.topology with request info, which causes the topology coordinator to pick up the request and eventually push the schema change entry itself. So for the API to guarantee that user sees the schema change upon returning, it's not enough to wait for the first entry to become applied, it actually has to wait for the entire request to finish (including the entry added by coordinator).

How does it create the entry? If it uses raft_group0_client::add_entry API to commit the entry, this underneath calls raft::server::add_entry with wait_type::applied, meaning that the future will resolve only after the entry is completely applied locally, which includes any in-memory changes that application of the entry performs.

The test that hit the bug starts the repair task and gets the id directly from system.tablets

starts the repair task

ok but does the API to start repair task, which is used by the test, wait for the group 0 entry to get applied (does it wait for the future returned by add_entry to get resolved)? And then does the test wait for the API to return response? If yes, then by that moment both system.tables and in-memory data structures should already be updated.

Deexie · 2024-12-20T15:12:19Z

🔴 CI State: FAILURE

✅ - Build ❌ - Unit Tests Custom The following new/updated tests ran 100 times for each mode: 🔹 topology_tasks/test_tablet_tasks

Failed Tests (1/630):
* [test_tablet_repair_task.dev.2](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/13997/testReport/junit/topology_tasks/test_tablet_tasks/test_tablet_repair_task_dev_2) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_tablet_repair_task.dev.2)
Build Details:
* Duration: 2 hr 4 min

* Builder: i-096de4578581f57a6 (m5ad.8xlarge)

I opened #22011

Deexie added the backport/none Backport is not required label Dec 19, 2024

Deexie self-assigned this Dec 19, 2024

Deexie requested review from asias and denesb December 19, 2024 15:55

scylladbbot added status/ci in progress and removed status/ci in progress labels Dec 19, 2024

asias reviewed Dec 20, 2024

View reviewed changes

Deexie added 2 commits December 20, 2024 09:30

replica: do not set tablet_task_info if it isn't valid

d6085ed

Currently, in tablet_map_to_mutation, repair's and migration's tablet_task_info is always set. Do not set the tablet_task_info if there is no running operation.

Deexie force-pushed the tablet-task-contains branch from d89078b to cf23a74 Compare December 20, 2024 09:34

scylladbbot added status/ci in progress and removed status/ci in progress labels Dec 20, 2024

denesb reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

service: use read barrier in tablet_virtual_task::contains #21995

service: use read barrier in tablet_virtual_task::contains #21995

Deexie commented Dec 19, 2024

scylladb-promoter commented Dec 19, 2024

asias Dec 20, 2024

Deexie Dec 20, 2024

Deexie Dec 20, 2024

Deexie commented Dec 20, 2024

scylladb-promoter commented Dec 20, 2024

denesb Dec 20, 2024

kbr-scylla Dec 20, 2024 •

edited

Loading

denesb Dec 20, 2024

denesb Dec 20, 2024

kbr-scylla Dec 20, 2024

kbr-scylla Dec 20, 2024

kbr-scylla Dec 20, 2024

kbr-scylla Dec 20, 2024 •

edited

Loading

Deexie Dec 20, 2024 •

edited

Loading

kbr-scylla Dec 20, 2024

Deexie commented Dec 20, 2024

🔴 CI State: FAILURE

Failed Tests (1/630):

Build Details:

service: use read barrier in tablet_virtual_task::contains #21995

Are you sure you want to change the base?

service: use read barrier in tablet_virtual_task::contains #21995

Conversation

Deexie commented Dec 19, 2024

scylladb-promoter commented Dec 19, 2024

🔴 CI State: FAILURE

Failed Tests (1/540):

Build Details:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Deexie commented Dec 20, 2024

scylladb-promoter commented Dec 20, 2024

🔴 CI State: FAILURE

Failed Tests (1/630):

Build Details:

Choose a reason for hiding this comment

kbr-scylla Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kbr-scylla Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Deexie Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Deexie commented Dec 20, 2024

🔴 CI State: FAILURE

Failed Tests (1/630):

Build Details:

kbr-scylla Dec 20, 2024 •

edited

Loading

kbr-scylla Dec 20, 2024 •

edited

Loading

Deexie Dec 20, 2024 •

edited

Loading