Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DROP TABLE may time out due to ongoing tablet migrations #21972

Open
wmitros opened this issue Dec 18, 2024 · 1 comment
Open

DROP TABLE may time out due to ongoing tablet migrations #21972

wmitros opened this issue Dec 18, 2024 · 1 comment

Comments

@wmitros
Copy link
Contributor

wmitros commented Dec 18, 2024

While dropping a table we call

    future<> await_pending_ops() noexcept {
        return when_all(await_pending_reads(), await_pending_writes(), await_pending_streams(), await_pending_flushes()).discard_result();
    }

Because of await_pending_streams(), if a tablet migration is ongoing when we get a DROP TABLE statement, the request needs to wait until the migration is finished, which can take a long time and which can exceed the request timeout.
For example, we can see in https://argus.scylladb.com/tests/scylla-cluster-tests/e10b8f8f-d982-459e-85c8-84f09e276c26 that dropping an index table finished just after an 1h migration of 8GB:

2024-12-14T18:08:25.075+00:00 longevity-large-partitions-200k-pks-db-node-e10b8f8f-0-2     !INFO | scylla[14678]:  [shard  0: gms] database - Dropping scylla_bench.test_view with auto-snapshot
...
2024-12-14T18:57:48.436+00:00 longevity-large-partitions-200k-pks-db-node-e10b8f8f-0-2     !INFO | scylla[14678]:  [shard  0:strm] stream_session - [Stream #f2a6fd61-ba43-11ef-8ab1-9742296e4d0d] Streaming plan for Tablet migration-scylla_bench-index-0 succeeded, peers={10.142.0.159}, tx=0 KiB, 0.00 KiB/s, rx=8279023 KiB, 2058.71 KiB/s
..
2024-12-14T18:57:58.186+00:00 longevity-large-partitions-200k-pks-db-node-e10b8f8f-0-2     !INFO | scylla[14678]:  [shard  0: gms] database - Truncated scylla_bench.test_view

Instead of waiting for the migration, we should interrupt it, knowing that it's going to fail anyway

The issues #16661, #15301, #9623, #8003 are likely similar, but we may wait for flushing or repair streaming instead there

@wmitros
Copy link
Contributor Author

wmitros commented Dec 18, 2024

cc @timtimb0t

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants