-
Notifications
You must be signed in to change notification settings - Fork 425
refactor(cli): unify storage configuration for export command #7280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Utilize ObjectStoreConfig to unify storage configuration for export command - Support export command for Fs, S3, OSS, GCS and Azblob - Fix the Display implementation for SecretString always returned the string "SecretString([REDACTED])" even when the internal secret was empty. Signed-off-by: McKnight22 <[email protected]>
|
|
||
| impl ObjectStoreConfig { | ||
| /// Builds the object store with S3. | ||
| pub fn build_s3(&self) -> Result<ObjectStore, BoxedError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why not use ·ObjectStoreConfig::build· directly?
| } | ||
| } | ||
|
|
||
| impl PrefixedAzblobConnection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making all fields public in the macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be better to move this repeated logic into the macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the export command to use a unified storage configuration approach, eliminating code duplication and adding support for multiple cloud storage backends (S3, OSS, GCS, Azure Blob) alongside the existing filesystem storage.
Key changes:
- Introduced a new
storage_exportmodule with a trait-based design for different storage backends - Replaced individual storage flags and configuration parameters with a unified
ObjectStoreConfig - Added comprehensive test coverage for all storage backend configurations
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/common/base/src/secrets.rs | Modified Display implementation for SecretString to return empty string when secret is empty |
| src/cli/src/error.rs | Removed unused S3ConfigNotSet error variant |
| src/cli/src/data/storage_export.rs | New module implementing storage backend abstraction with trait-based design for Fs, S3, OSS, GCS, and Azblob |
| src/cli/src/data/export.rs | Refactored to use unified storage config, removed duplicated operator building logic, added unit tests for all backends |
| src/cli/src/data.rs | Added storage_export module declaration |
| src/cli/src/common/object_store.rs | Added accessor methods for storage connection configs and split build methods for individual backends |
| src/cli/src/common.rs | Exported new connection type aliases for use in storage backends |
| src/cli/Cargo.toml | Added common-test-util dependency for testing |
| Cargo.lock | Updated dependency tree with common-test-util |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Get the SAS token. | ||
| pub fn sas_token(&self) -> Option<&String> { | ||
| self.azblob_sas_token.as_ref() |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SAS token is stored as Option<String> but should be Option<SecretString> for consistency with other sensitive credentials (account_name, account_key). This exposes the SAS token in logs and debug output, creating a potential security issue.
Change the field type in AzblobConnection:
sas_token: Option<SecretString>,And update the sas_token() getter to return Option<&SecretString>:
pub fn sas_token(&self) -> Option<&SecretString> {
self.azblob_sas_token.as_ref()
}Then use expose_secret() when accessing the value in the storage export code.
| if self.expose_secret().is_empty() { | ||
| write!(f, "") | ||
| } else { | ||
| write!(f, "SecretString([REDACTED])") | ||
| } |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The Display implementation for SecretString returns an empty string when the secret is empty. This creates inconsistent behavior and could be confusing for debugging. When formatting an empty SecretString, it would be clearer to still show "SecretString([REDACTED])" or use "SecretString(empty)" to make it explicit that it's a SecretString type.
Consider using:
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "SecretString([REDACTED])")
}This maintains consistent formatting regardless of whether the secret is empty or not.
| if self.expose_secret().is_empty() { | |
| write!(f, "") | |
| } else { | |
| write!(f, "SecretString([REDACTED])") | |
| } | |
| write!(f, "SecretString([REDACTED])") |
| if !self.config.endpoint().is_empty() { | ||
| connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint())); | ||
| } |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The endpoint validation in OssBackend::new() ensures the endpoint is not empty (line 185-192), but then in get_storage_path() there's a redundant check if !self.config.endpoint().is_empty() (line 230).
Since the endpoint is guaranteed to be non-empty after validation, this check is unnecessary and the endpoint should always be included in the connection options. Consider removing the conditional check:
connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));This makes the code clearer and more consistent with the validation logic.
| if !self.config.endpoint().is_empty() { | |
| connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint())); | |
| } | |
| connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint())); |
| StorageType::Azblob(backend) => backend.format_output_path(catalog, file_path), | ||
| } | ||
| } | ||
|
|
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing rustdoc comment for the public is_remote_storage() method. While the other methods in this impl block have documentation, this one doesn't.
Add a doc comment:
/// Returns true if the storage backend is remote (not local filesystem).
pub fn is_remote_storage(&self) -> bool {| /// Returns true if the storage backend is remote (not local filesystem). |
| fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) { | ||
| let bucket = self.config.bucket(); | ||
| let oss_path = format!("oss://{}/{}/{}/", bucket, catalog, schema); | ||
|
|
||
| let mut connection_options = vec![ | ||
| format!( | ||
| "ACCESS_KEY_ID='{}'", | ||
| self.config.access_key_id().expose_secret() | ||
| ), | ||
| format!( | ||
| "ACCESS_KEY_SECRET='{}'", | ||
| self.config.access_key_secret().expose_secret() | ||
| ), | ||
| ]; | ||
|
|
||
| if !self.config.endpoint().is_empty() { | ||
| connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint())); | ||
| } | ||
|
|
||
| let connection_str = format!(" CONNECTION ({})", connection_options.join(", ")); | ||
| (oss_path, connection_str) | ||
| } |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OSS backend's get_storage_path method doesn't include the root path from the configuration, unlike the S3 and GCS backends. This means that if a user specifies --oss-root, it will be ignored in the export path generation.
The path should be:
let root = if self.config.root().is_empty() {
String::new()
} else {
format!("/{}", self.config.root())
};
let oss_path = format!("oss://{}{}/{}/{}/", bucket, root, catalog, schema);|
|
||
| fn format_output_path(&self, catalog: &str, file_path: &str) -> String { | ||
| let bucket = self.config.bucket(); | ||
| format!("oss://{}/{}/{}", bucket, catalog, file_path) |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format_output_path method for OSS backend doesn't include the root path from the configuration, unlike the S3, GCS, and Azblob backends. This inconsistency means the logged output path won't match the actual storage location when --oss-root is specified.
The method should format the path similarly to S3:
let root = if self.config.root().is_empty() {
String::new()
} else {
format!("/{}", self.config.root())
};
format!("oss://{}{}/{}/{}", bucket, root, catalog, file_path)| format!("oss://{}/{}/{}", bucket, catalog, file_path) | |
| let root = if self.config.root().is_empty() { | |
| String::new() | |
| } else { | |
| format!("/{}", self.config.root()) | |
| }; | |
| format!("oss://{}{}/{}/{}", bucket, root, catalog, file_path) |
| let root = if self.config.root().is_empty() { | ||
| String::new() | ||
| } else { | ||
| format!("/{}", self.config.root()) | ||
| }; |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The root path formatting logic is duplicated across S3, GCS, and Azblob backends (lines 116-120, 290-294, 390-394). Consider extracting this into a helper function to reduce duplication:
fn format_root_path(root: &str) -> String {
if root.is_empty() {
String::new()
} else {
format!("/{}", root)
}
}This would make the code more maintainable and consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
#6220
What's changed and what's your intention?
Summary (mandatory):
This PR introduces export command support for Fs, S3, OSS, GCS and Azblob.
Details:
This PR refactors the export command to use the unified ObjectStoreConfig from the common module instead of introducing duplicated code logic for separate storage type in src/cli/src/data/export.rs.
PR Checklist
Please convert it to a draft if some of the following conditions are not met.