Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tasks: add tablet resize virtual task #21891

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

Deexie
Copy link
Collaborator

@Deexie Deexie commented Dec 11, 2024

In this change, tablet_virtual_task starts supporting tablet
resize (i.e. split and merge).

Users can see running resize tasks - finished tasks are not
presented with the task manager API.

A new task state "suspended" is added. If a resize was revoked,
it will appear to users as suspended. We assume that the resize was revoked
when the tablet number didn't change.

Fixes: #21366.
Fixes: #21367.

No backport, new feature

@Deexie
Copy link
Collaborator Author

Deexie commented Dec 12, 2024

  • modify default is_internal so that virtual tasks children are not internal
  • gather children for split tasks
  • add test for revoked split
  • add children check to test_tablet_resize_list

@Deexie
Copy link
Collaborator Author

Deexie commented Dec 16, 2024

  • rebase

@Deexie
Copy link
Collaborator Author

Deexie commented Dec 16, 2024

@Deexie Deexie marked this pull request as ready for review December 16, 2024 12:07
@Deexie Deexie requested review from denesb and raphaelsc and removed request for tgrabiec and kbr-scylla December 16, 2024 12:07
@Deexie
Copy link
Collaborator Author

Deexie commented Dec 16, 2024

  • use features reference (instead of pointer) in set_resize_decision

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
❌ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 boost/sstable_set_test
🔹 boost/tablets_test
🔹 topology_tasks/test_tablet_tasks

Failed Tests (5/291):

Build Details:

  • Duration: 1 hr 20 min
  • Builder: i-0657dbc74054c4371 (m5d.12xlarge)

@Deexie
Copy link
Collaborator Author

Deexie commented Dec 17, 2024

  • rebase

@Deexie Deexie added the backport/none Backport is not required label Dec 17, 2024
@Deexie Deexie self-assigned this Dec 17, 2024
@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
❌ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 boost/sstable_set_test
🔹 boost/tablets_test
🔹 topology_tasks/test_tablet_tasks

Failed Tests (7/291):

Build Details:

  • Duration: 2 hr 5 min
  • Builder: i-09f1a2b0c4a4398e6 (m5ad.8xlarge)

@Deexie
Copy link
Collaborator Author

Deexie commented Dec 17, 2024

  • use features reference (instead of pointer) in set_resize_decision
  • get back to pointers (test needs that)
  • lower datasize in revoke test

@Deexie
Copy link
Collaborator Author

Deexie commented Jan 9, 2025

  • use features reference (instead of pointer) in set_resize_decision
    • set resize_task_info in test based on on-disk state
  • task_manager::get_nodes and task_manager::module::get_nodes aren't noexcept
  • get initial tablet_count in tablet_virtual_task::wait
  • change output arguments to return values
  • add commit message to tasks: children of virtual tasks aren't internal by default
  • replace parent_info with tablet_split_task_info

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
❌ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 boost/sstable_set_test
🔹 boost/tablets_test
🔹 topology_tasks/test_tablet_tasks

Failed Tests (2/647):

Build Details:

  • Duration: 2 hr 22 min
  • Builder: i-0e0b5486afcc7e412 (m5ad.8xlarge)

Deexie added 21 commits January 10, 2025 10:03
Add resize_task_info static column to system.tablets. Set or delete
resize_task_info value when the resize_decision is changed.
Reflect the column content in tablet_map.
Move an implementation of node_ops::task_manager_module::get_nodes
to task_manager::get_nodes, so that it can be reused by other modules.
Extend tablet_virtual_task::get_stats to list resize tasks.
Extend tablet_virtual_task::contains to check resize operations.

Methods that do not support resize tasks return immediately if they
are handling split or merge task.
Extend tablet_virtual_task::get_status to cover resize tasks.
Add suspended task state. It will be used for revoke resize requests.
Extend tablet_virtual_task::wait to support resize tasks.

To decide what is a state of a finished resize virtual task (done
or failed), the tablet count is checked. The task state is set to done,
if the tablet count before resize is different than after.
Set resize tasks as non abortable.
Initialize shard in task_info constructor. All current usages do
not care about the shard of an empty task_info.

In the following patches we may need that for setting info about
virtual task parent.
Currently, streaming_task_impl is the only existing child of any
virtual task.  It overrides the is_internal definition so that it
is non-internal even though it has a parent.

This should apply to all children of all virtual tasks. Modify
task_manager::task::impl::is_internal so that children of virtual
tasks aren't internal by default.
Pass task_info down to storage_group::split.

In the following patches, it will be used to set the parent
of offstrategy_compaction_task_executor and split_compaction_task_executor
running as a part of the split. The task_info param will contain task
info of a split virtual task.
offstrategy_compaction_task_executor and split_compaction_task_executor
running as a part of the split become children of a split virtual task.
The test is skipped in debug mode, because the preparation of revoke
takes too long and wait request, which needs to be started before
the preparation, hits timeout.
@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 boost/sstable_set_test
🔹 boost/tablets_test
🔹 topology_tasks/test_tablet_tasks
✅ - dtest
✅ - dtest with tablets
✅ - dtest with topology changes
✅ - Docker Test
✅ - Offline-installer Artifact Tests
❌ - Unit Tests

Failed Tests (2/40128):

Build Details:

  • Duration: 5 hr 22 min
  • Builder: spider5.cloudius-systems.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/none Backport is not required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tasks-manager: create tasks for tablet merge tasks-manager: create task for tablet split
4 participants