Skip to content

Conversation

@utay
Copy link

@utay utay commented Dec 28, 2025

Following this discussion and duckdb/duckdb#20333 which introduces RemoveFiles on FileSystem, this PR implements RemoveFiles for S3FileSystem.

It uses the S3 DeleteObjects API to delete objects in batches of 1000.

I couldn't find how to test this, as the other methods (like RemoveFile) don't seem to be unit tested, and the RemoveFiles method is not exposed as a SQL command. Please let me know if there's a way to do so! I was still able to test it end-to-end with a SQL test in the ducklake repo (PR incoming).

@pdet
Copy link
Collaborator

pdet commented Jan 6, 2026

Hi @utay, can you retarget this PR to main? I'm also unaware of a way to test the deletion function directly. Maybe @samansmink or @carlopi has an idea?

@utay utay force-pushed the yu/remove-files branch from f77bb59 to 01755c4 Compare January 6, 2026 18:38
@utay utay changed the base branch from v1.4-andium to main January 6, 2026 18:38
@utay
Copy link
Author

utay commented Jan 6, 2026

Hi @utay, can you retarget this PR to main?

Done!

I'm also unaware of a way to test the deletion function directly. Maybe @samansmink or @carlopi has an idea?

Yep, let me know, happy to help write a test

@carlopi
Copy link
Collaborator

carlopi commented Jan 6, 2026

I need to bump duckdb further, since currently it doesn't include duckdb/duckdb#20333 (see failure in MinIO tests). I'll ping once this can be rebased.

@carlopi
Copy link
Collaborator

carlopi commented Jan 7, 2026

@utay: can you merge with / rebase to the main branch? Should be ready after that CI side.

I think the change set might slightly change behaviour, and would indeed need to be more properly tested.

I am not sure of what strategy to recommend, probably combination of populating files + globs + deletion + globs in SQL to be execute as integration test would be handy.

@utay utay force-pushed the yu/remove-files branch from 01755c4 to f19018f Compare January 7, 2026 15:22
@utay
Copy link
Author

utay commented Jan 8, 2026

@carlopi Thanks, I've rebased the branch on main.

I’ll fix the compilation issues and try to add an integration test as you suggested when I have some time!

@utay
Copy link
Author

utay commented Jan 11, 2026

@carlopi I've fixed compilation, CI should pass now.

Regarding the integration test, I think there are 2 options:

  • Rely on the COPY overwrite mode, but this would require removing this check
    in DuckDB. Also, this doesn’t explicitly delete the existing files; it overwrites them implicitly
  • Introduce a SQL function like s3_remove_files (or similar) to explicitly clean up the target path. There don’t seem to be any existing S3 helper functions for this today, but adding one might be acceptable

I couldn't find another way to test this with the existing test infrastructure, let me know what you think!

@carlopi
Copy link
Collaborator

carlopi commented Jan 12, 2026

@utay: I've opened duckdb/duckdb#20487, I think removing the check is the path forward, but need to check with the team.

Mytherin added a commit to duckdb/duckdb that referenced this pull request Jan 13, 2026
…ems (#20487)

Connected to duckdb/duckdb-httpfs#181, where the
author noticed we could lift this limitation (and that would also be
handy in testing the new implementation of `RemoveFiles` for S3).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants