feat: Allow custom S3 filesystem factory#17400
Conversation
✅ Deploy Preview for meta-velox canceled.
|
Build Impact AnalysisSelective Build Targets (building these covers all 14 affected)Total affected: 14/575 targets Affected targets (14)Directly changed (9)
Transitively affected (5)
Fast path • Graph from main@59f88d56b95c2854723e46f26f37f9e341abb2c4 |
JkSelf
left a comment
There was a problem hiding this comment.
@ReemaAlzaid Thanks for your fix. LGTM except small comments.
There was a problem hiding this comment.
Pull request overview
Adds an extension point to Velox’s S3 filesystem registration to allow callers to supply a custom FileSystem factory for s3:// paths while preserving the existing caching and FileSink registration behavior.
Changes:
- Extend
registerS3FileSystemto accept an optionalS3FileSystemFactoryfor custom S3FileSystemconstruction. - Use the provided factory (when present) in the S3 filesystem generator; default behavior remains
S3FileSystem. - Add a unit test verifying that
getFileSystem()returns the custom S3 filesystem implementation when registered.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
velox/connectors/hive/storage_adapters/s3fs/tests/S3FileSystemRegistrationTest.cpp |
Registers S3 with a custom factory and adds a test asserting the returned filesystem type. |
velox/connectors/hive/storage_adapters/s3fs/RegisterS3FileSystem.h |
Introduces S3FileSystemFactory and extends the registerS3FileSystem API. |
velox/connectors/hive/storage_adapters/s3fs/RegisterS3FileSystem.cpp |
Stores the factory and uses it to create cached S3 filesystem instances. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| initializeS3(logLevel, logLocation); | ||
| auto fs = std::make_shared<S3FileSystem>(bucketName, properties); | ||
| auto fs = fileSystemFactory | ||
| ? fileSystemFactory(bucketName, properties) | ||
| : std::make_shared<S3FileSystem>(bucketName, properties); | ||
| instanceMap.insert({cacheKey, fs}); | ||
| return fs; |
rui-mo
left a comment
There was a problem hiding this comment.
Thanks. And please also take a look at the Copilot comments.
| TEST_F(S3FileSystemRegistrationTest, customFileSystemFactory) { | ||
| auto hiveConfig = minioServer_->hiveConfig(); | ||
| auto s3fs = filesystems::getFileSystem(kDummyPath, hiveConfig); | ||
| ASSERT_NE(std::dynamic_pointer_cast<CustomS3FileSystem>(s3fs), nullptr); |
There was a problem hiding this comment.
nit: could use VELOX_CHECK_NOT_NULL.
| return config->get<std::string>("hive.s3.endpoint").value(); | ||
| } | ||
|
|
||
| class CustomS3FileSystem : public S3FileSystem { |
There was a problem hiding this comment.
For the customization, do we intend to override S3FileSystem::Impl? If so, we may need to make the relevant members protected.
There was a problem hiding this comment.
S3FileSystem::Impl and impl_ are already protected in S3FileSystem.h, but the actual Impl definition lives in S3FileSystem.cpp, so it remains an internal implementation detail:
https://github.com/facebookincubator/velox/blob/main/velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.h#L90-L92
https://github.com/facebookincubator/velox/blob/main/velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.cpp#L230
The factory hook is intended to customize the FileSystem/S3FileSystem instance, not override Impl.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
CI Failure Analysis
⚪ Aggregation Fuzzer with Presto as source of truth — FUZZER Failure View logsAll 4 instances crashed (seeds: 119425723, 680258782, 631281975, 995490894) Error: ⚪ Expression Fuzzer with Presto SOT — FUZZER Failure View logsAll 4 instances crashed (seeds: 564466436, 730604412, 630078507, 179380940) Error: ⚪ Join Fuzzer — FUZZER Failure View logs1 of 4 instances crashed (failed seed: 377890932; seeds 962015200, 1036667219, 956565828 passed) Error: ⚪ Window Fuzzer with Presto as source of truth — FUZZER Failure View logs3 of 4 instances crashed (failed seeds: 673996794, 490174137, 229164968; seed 1019969253 passed) Error: Correlation with PR changes:
Known issues:
Reproduce locally: # Aggregation Fuzzer (example with seed 119425723)
./velox_aggregation_fuzzer_test --seed 119425723 --duration_sec 300 --minloglevel=0 --stderrthreshold=2 --enable_oom_injection --presto_url http://127.0.0.1:8080
# Expression Fuzzer (example with seed 564466436)
./velox_expression_fuzzer_test --seed 564466436 --enable_variadic_signatures --velox_fuzzer_enable_complex_types --lazy_vector_generation_ratio 0.2 --presto_url http://127.0.0.1:8080
# Join Fuzzer (example with seed 377890932)
./velox_join_fuzzer --seed 377890932 --duration_sec 300 --minloglevel=0 --stderrthreshold=2 --enable_oom_injection --presto_url http://127.0.0.1:8080
# Window Fuzzer (example with seed 673996794)
./velox_window_fuzzer_test --seed 673996794 --duration_sec 300 --batch_size=50 --minloglevel=0 --stderrthreshold=2 --enable_oom_injection --presto_url http://127.0.0.1:8080Recommended fix: No action needed for this PR. These are pre-existing fuzzer flakes on |
| initializeS3(logLevel, logLocation); | ||
| auto fs = std::make_shared<S3FileSystem>(bucketName, properties); | ||
| auto fs = fileSystemFactory | ||
| ? fileSystemFactory(std::move(bucketName), properties) |
There was a problem hiding this comment.
Please check fileSystemFactory is not null before calling it.
What changes were proposed in this pull request?
This PR adds an optional filesystem factory parameter to
registerS3FileSystem.This is the Velox side change intended to unblock the corresponding Gluten custom S3 filesystem registration work in apache/gluten#12026.
With this change, callers can reuse Velox's existing S3 registration path while supplying a custom
FileSystemimplementation fors3://paths. When no factory is provided, the current behavior is unchanged and Velox still constructsS3FileSystem.This preserves the existing:
FileSinkregistrationfinalizeS3FileSystem()behaviorThis PR also adds a test that verifies
getFileSystem()returns the custom subclass when a factory is registeredWhy are the changes needed?
Velox currently hardcodes
S3FileSystemconstruction inside S3 registration. That makes it difficult for integrations to customize S3 filesystem creation while still reusing Velox's registration and caching logicThis change exposes a small generic extension point without changing the default behavior.
Does this PR introduce any user facing change?
No.
How was this patch tested?
RegisterS3FileSystem.cppwith a syntax compile using the existing compile database