Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service: use read barrier in tablet_virtual_task::contains #21995

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Deexie
Copy link
Collaborator

@Deexie Deexie commented Dec 19, 2024

Currently, when the tablet repair is started, info regarding the operation is kept in the system.tablets. The new tablet states are reflected in memory after load_topology_state is called. Before that, the data in the table and the memory aren't consistent.

To check the supported operations, tablet_virtual_task uses in-memory tablet_metadata. Hence, it may not see the operation, even though its info is already kept in system.tablets table.

Run read barrier in tablet_virtual_task::contains to ensure it will see the latest data. Add a test to check it.

Fixes: #21975.

No backport needed, 6.2 does not contain tablet_virtual_task

@Deexie Deexie added the backport/none Backport is not required label Dec 19, 2024
@Deexie Deexie self-assigned this Dec 19, 2024
@Deexie Deexie requested review from asias and denesb December 19, 2024 15:55
@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
❌ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 topology_tasks/test_tablet_tasks

Failed Tests (1/540):

Build Details:

  • Duration: 48 min
  • Builder: spider1.cloudius-systems.com

async def del_repair_task():
tablet_task_id = empty_id
while tablet_task_id == empty_id:
tablet_task_id = await get_tablet_task_id(cql, hosts[0], table_id, token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_tablet_task_id returns None or a valid task id. Why do we compare with 00000000-0000-0000-0000-000000000000?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Looks like I broke the assumption. Fixing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, in tablet_map_to_mutation, repair's and migration's
tablet_task_info is always set.

Do not set the tablet_task_info if there is no running operation.
Currently, when the tablet repair is started, info regarding
the operation is kept in the system.tablets. The new tablet states
are reflected in memory after load_topology_state is called.
Before that, the data in the table and the memory aren't consistent.

To check the supported operations, tablet_virtual_task uses in-memory
tablet_metadata. Hence, it may not see the operation, even though
its info is already kept in system.tablets table.

Run read barrier in tablet_virtual_task::contains to ensure it will
see the latest data. Add a test to check it.

Fixes: scylladb#21975.
@Deexie Deexie force-pushed the tablet-task-contains branch from d89078b to cf23a74 Compare December 20, 2024 09:34
@Deexie
Copy link
Collaborator Author

Deexie commented Dec 20, 2024

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
❌ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 topology_tasks/test_tablet_tasks

Failed Tests (1/630):

Build Details:

  • Duration: 2 hr 4 min
  • Builder: i-096de4578581f57a6 (m5ad.8xlarge)

@@ -50,6 +51,8 @@ static bool tablet_id_provided(const locator::tablet_task_type& task_type) {
}

future<std::optional<tasks::virtual_task_hint>> tablet_virtual_task::contains(tasks::task_id task_id) const {
co_await _ss._migration_manager.local().get_group0_barrier().trigger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbr-scylla can you please confirm/deny that this solves the problem described in the patch notes?

Copy link
Contributor

@kbr-scylla kbr-scylla Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to say without understanding the context.

What are the causality dependencies here? From the description I'm guessing that:

  • something guarantees that the task_id we're looking for is already in system.tablets (what is that something?) in the context of this PR
  • but that same thing does not guarantee that it's already in the in-memory data structures that are built from system.tablets when we reload the table (so the table was not reloaded)

if first point holds, then read barrier will make sure that whatever was visible in system.tablets is also visible in the in-memory state, but note that we only are guaranteed to see at least as recent state, i.e. the state we see could be e.g. after the data was removed (as we're basically sleeping for a potentially long time) and then we won't see the data, and then I don't know if it's guaranteed that the data is still there in the context we're discussing in, so dunno.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something guarantees that the task_id we're looking for is already in system.tablets (what is that something?) in the context of this PR

This something is an API request for starting a new repair. This will create an entry in group0, which gets to system.tablets, but not immediately visible in memory, because the topology object is regenerated first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario which prompted this PR is:

  • user starts repair via API
  • immediately after previous request returned, user checks state of repair task

Before this PR, user would get an error that repair task with given ID doesn't exist. Upon retry, they would see the expected task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create an entry in group0, which gets to system.tablets, but not immediately visible in memory

How does it create the entry? If it uses raft_group0_client::add_entry API to commit the entry, this underneath calls raft::server::add_entry with wait_type::applied, meaning that the future will resolve only after the entry is completely applied locally, which includes any in-memory changes that application of the entry performs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it uses raft_group0_client::add_entry API to commit the entry, this underneath calls raft::server::add_entry with wait_type::applied, meaning that the future will resolve only after the entry is completely applied locally, which includes any in-memory changes that application of the entry performs.

So if the repair API calls add_entry and waits for the future before returning to the user, then the bug must be somewhere else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the problem is that the first entry added to group 0 doesn't actually update system.tablets, but some kind of background process starts and a follow-up entry is then added (maybe on topology coordinator) but then this PR also won't solve the problem as the read_barrier might observe the state in-between those 2 entries.

Copy link
Contributor

@kbr-scylla kbr-scylla Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the problem is that the first entry added to group 0 doesn't actually update system.tablets, but some kind of background process starts and a follow-up entry is then added (maybe on topology coordinator)

for example alter keyspace in tablets mode works this way: the node handling the request only adds a group 0 entry which updates system.topology with request info, which causes the topology coordinator to pick up the request and eventually push the schema change entry itself. So for the API to guarantee that user sees the schema change upon returning, it's not enough to wait for the first entry to become applied, it actually has to wait for the entire request to finish (including the entry added by coordinator).

Copy link
Collaborator Author

@Deexie Deexie Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it create the entry? If it uses raft_group0_client::add_entry API to commit the entry, this underneath calls raft::server::add_entry with wait_type::applied, meaning that the future will resolve only after the entry is completely applied locally, which includes any in-memory changes that application of the entry performs.

The test that hit the bug starts the repair task and gets the id directly from system.tablets

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starts the repair task

ok but does the API to start repair task, which is used by the test, wait for the group 0 entry to get applied (does it wait for the future returned by add_entry to get resolved)? And then does the test wait for the API to return response? If yes, then by that moment both system.tables and in-memory data structures should already be updated.

@Deexie
Copy link
Collaborator Author

Deexie commented Dec 20, 2024

🔴 CI State: FAILURE

✅ - Build ❌ - Unit Tests Custom The following new/updated tests ran 100 times for each mode: 🔹 topology_tasks/test_tablet_tasks

Failed Tests (1/630):

* [test_tablet_repair_task.dev.2](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/13997/testReport/junit/topology_tasks/test_tablet_tasks/test_tablet_repair_task_dev_2) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_tablet_repair_task.dev.2)

Build Details:

* Duration: 2 hr 4 min

* Builder: i-096de4578581f57a6 (m5ad.8xlarge)

I opened #22011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/none Backport is not required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

task manager: task id is reported as not found when the task id is available in system.tablets table
6 participants