Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dl/coordinator: periodically remove snapshots #24813

Merged
merged 3 commits into from
Jan 22, 2025

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Jan 15, 2025

Adds periodic removal of Iceberg snapshots to the coordinator. As is, the removal runs at the same cadence as table commits, though that may change in the future.

Once snapshots are removed from the table metadata, the coordinator examines the resulting table and extracts any snapshots that no longer exist. These snapshots are deleted from object storage synchronously (it's left as future work to do this in the background, or to delay removal if desired).

The code uses a similar structure to the file committer, and as such will need to be updated later to support dead letter tables.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • Redpanda will now periodically remove expired snapshots from Iceberg Topic tables.

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 15, 2025

CI test results

test results on build#60759
test_id test_kind job_url test_status passed
idempotency_tests_rpunit.idempotency_tests_rpunit unit https://buildkite.com/redpanda/redpanda/builds/60759#0194690c-5856-4c3a-b1b4-80b60a3b99dd FLAKY 1/2
rptest.tests.partition_balancer_test.PartitionBalancerTest.test_unavailable_nodes ducktape https://buildkite.com/redpanda/redpanda/builds/60759#01946968-cfdf-461b-8718-dca96ee110ed FLAKY 5/6
test results on build#60885
test_id test_kind job_url test_status passed
rptest.tests.cloud_storage_scrubber_test.CloudStorageScrubberTest.test_scrubber.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/60885#019472ee-81f8-4aa3-8619-c6bb3fbc6f05 FLAKY 1/2
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli ducktape https://buildkite.com/redpanda/redpanda/builds/60885#019472e1-cb97-4e85-a827-2f0145730685 FLAKY 1/2
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli ducktape https://buildkite.com/redpanda/redpanda/builds/60885#019472ee-81f8-4c68-a75b-ef3af53c4122 FLAKY 1/8
test results on build#60999
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/60999#01948a1f-7d11-4ebd-a1ad-75765fe5faac FLAKY 1/2

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epic

Adds a method that takes as inputs a list of files to delete, and then
calls into the cloud_io::remote to delete them. This serves primarily to
translate std::filesystem::paths (which we may translate from Iceberg
URIs) into cloud_storage_clients::object_keys (used by cloud_io).

This will be used by the datalake coordinator to remove expired
snapshots.
Introduces a new abstraction to remove old snapshots from a given table.
This effectively reduces the window with which users can perform
point-in-time queries. It is intended that this operation is standalone
and doesn't require additional changes to the coordinator STM to run.

This commit only introduces the remover and an no-op implementation for
ease of testing, similar to the file_committer abstraction. A subsequent
change will add this to the coordinator.
Adds snapshot removal to the coordinator loop. It is currently run
before any appends.
@andrwng andrwng force-pushed the dl-remove-snapshots branch from e1c059c to 1bba0de Compare January 21, 2025 17:12
@andrwng andrwng requested a review from dotnwat January 22, 2025 06:19
@andrwng andrwng merged commit 0f50ab0 into redpanda-data:dev Jan 22, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants