Skip to content

Conversation

@McKnight22
Copy link
Contributor

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#6220

What's changed and what's your intention?

Summary (mandatory):

This PR introduces export command support for Fs, S3, OSS, GCS and Azblob.

Details:

This PR refactors the export command to use the unified ObjectStoreConfig from the common module instead of introducing duplicated code logic for separate storage type in src/cli/src/data/export.rs.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

- Utilize ObjectStoreConfig to unify storage configuration for export command
- Support export command for Fs, S3, OSS, GCS and Azblob
- Fix the Display implementation for SecretString always returned the string
  "SecretString([REDACTED])" even when the internal secret was empty.

Signed-off-by: McKnight22 <[email protected]>
@McKnight22 McKnight22 requested a review from a team as a code owner November 22, 2025 04:18
@github-actions github-actions bot added size/L docs-not-required This change does not impact docs. labels Nov 22, 2025

impl ObjectStoreConfig {
/// Builds the object store with S3.
pub fn build_s3(&self) -> Result<ObjectStore, BoxedError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why not use ·ObjectStoreConfig::build· directly?

}
}

impl PrefixedAzblobConnection {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making all fields public in the macro?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be better to move this repeated logic into the macro.

@WenyXu WenyXu requested a review from Copilot November 24, 2025 06:45
Copilot finished reviewing on behalf of WenyXu November 24, 2025 06:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the export command to use a unified storage configuration approach, eliminating code duplication and adding support for multiple cloud storage backends (S3, OSS, GCS, Azure Blob) alongside the existing filesystem storage.

Key changes:

  • Introduced a new storage_export module with a trait-based design for different storage backends
  • Replaced individual storage flags and configuration parameters with a unified ObjectStoreConfig
  • Added comprehensive test coverage for all storage backend configurations

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/common/base/src/secrets.rs Modified Display implementation for SecretString to return empty string when secret is empty
src/cli/src/error.rs Removed unused S3ConfigNotSet error variant
src/cli/src/data/storage_export.rs New module implementing storage backend abstraction with trait-based design for Fs, S3, OSS, GCS, and Azblob
src/cli/src/data/export.rs Refactored to use unified storage config, removed duplicated operator building logic, added unit tests for all backends
src/cli/src/data.rs Added storage_export module declaration
src/cli/src/common/object_store.rs Added accessor methods for storage connection configs and split build methods for individual backends
src/cli/src/common.rs Exported new connection type aliases for use in storage backends
src/cli/Cargo.toml Added common-test-util dependency for testing
Cargo.lock Updated dependency tree with common-test-util

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +99 to +101
/// Get the SAS token.
pub fn sas_token(&self) -> Option<&String> {
self.azblob_sas_token.as_ref()
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SAS token is stored as Option<String> but should be Option<SecretString> for consistency with other sensitive credentials (account_name, account_key). This exposes the SAS token in logs and debug output, creating a potential security issue.

Change the field type in AzblobConnection:

sas_token: Option<SecretString>,

And update the sas_token() getter to return Option<&SecretString>:

pub fn sas_token(&self) -> Option<&SecretString> {
    self.azblob_sas_token.as_ref()
}

Then use expose_secret() when accessing the value in the storage export code.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +55
if self.expose_secret().is_empty() {
write!(f, "")
} else {
write!(f, "SecretString([REDACTED])")
}
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The Display implementation for SecretString returns an empty string when the secret is empty. This creates inconsistent behavior and could be confusing for debugging. When formatting an empty SecretString, it would be clearer to still show "SecretString([REDACTED])" or use "SecretString(empty)" to make it explicit that it's a SecretString type.

Consider using:

fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
    write!(f, "SecretString([REDACTED])")
}

This maintains consistent formatting regardless of whether the secret is empty or not.

Suggested change
if self.expose_secret().is_empty() {
write!(f, "")
} else {
write!(f, "SecretString([REDACTED])")
}
write!(f, "SecretString([REDACTED])")

Copilot uses AI. Check for mistakes.
Comment on lines +230 to +232
if !self.config.endpoint().is_empty() {
connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));
}
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The endpoint validation in OssBackend::new() ensures the endpoint is not empty (line 185-192), but then in get_storage_path() there's a redundant check if !self.config.endpoint().is_empty() (line 230).

Since the endpoint is guaranteed to be non-empty after validation, this check is unnecessary and the endpoint should always be included in the connection options. Consider removing the conditional check:

connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));

This makes the code clearer and more consistent with the validation logic.

Suggested change
if !self.config.endpoint().is_empty() {
connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));
}
connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));

Copilot uses AI. Check for mistakes.
StorageType::Azblob(backend) => backend.format_output_path(catalog, file_path),
}
}

Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing rustdoc comment for the public is_remote_storage() method. While the other methods in this impl block have documentation, this one doesn't.

Add a doc comment:

/// Returns true if the storage backend is remote (not local filesystem).
pub fn is_remote_storage(&self) -> bool {
Suggested change
/// Returns true if the storage backend is remote (not local filesystem).

Copilot uses AI. Check for mistakes.
Comment on lines +215 to +236
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
let bucket = self.config.bucket();
let oss_path = format!("oss://{}/{}/{}/", bucket, catalog, schema);

let mut connection_options = vec![
format!(
"ACCESS_KEY_ID='{}'",
self.config.access_key_id().expose_secret()
),
format!(
"ACCESS_KEY_SECRET='{}'",
self.config.access_key_secret().expose_secret()
),
];

if !self.config.endpoint().is_empty() {
connection_options.push(format!("ENDPOINT='{}'", self.config.endpoint()));
}

let connection_str = format!(" CONNECTION ({})", connection_options.join(", "));
(oss_path, connection_str)
}
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OSS backend's get_storage_path method doesn't include the root path from the configuration, unlike the S3 and GCS backends. This means that if a user specifies --oss-root, it will be ignored in the export path generation.

The path should be:

let root = if self.config.root().is_empty() {
    String::new()
} else {
    format!("/{}", self.config.root())
};
let oss_path = format!("oss://{}{}/{}/{}/", bucket, root, catalog, schema);

Copilot uses AI. Check for mistakes.

fn format_output_path(&self, catalog: &str, file_path: &str) -> String {
let bucket = self.config.bucket();
format!("oss://{}/{}/{}", bucket, catalog, file_path)
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format_output_path method for OSS backend doesn't include the root path from the configuration, unlike the S3, GCS, and Azblob backends. This inconsistency means the logged output path won't match the actual storage location when --oss-root is specified.

The method should format the path similarly to S3:

let root = if self.config.root().is_empty() {
    String::new()
} else {
    format!("/{}", self.config.root())
};
format!("oss://{}{}/{}/{}", bucket, root, catalog, file_path)
Suggested change
format!("oss://{}/{}/{}", bucket, catalog, file_path)
let root = if self.config.root().is_empty() {
String::new()
} else {
format!("/{}", self.config.root())
};
format!("oss://{}{}/{}/{}", bucket, root, catalog, file_path)

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +120
let root = if self.config.root().is_empty() {
String::new()
} else {
format!("/{}", self.config.root())
};
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The root path formatting logic is duplicated across S3, GCS, and Azblob backends (lines 116-120, 290-294, 390-394). Consider extracting this into a helper function to reduce duplication:

fn format_root_path(root: &str) -> String {
    if root.is_empty() {
        String::new()
    } else {
        format!("/{}", root)
    }
}

This would make the code more maintainable and consistent.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants