-
-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: memory/cpu efficient handling of stream configurations #1191
base: main
Are you sure you want to change the base?
Conversation
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes update the handling of custom partitions across the codebase. Variable names, function parameters, and structure fields previously named in the singular (e.g. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTPHandler
participant PutStreamHeaders
participant HeaderParser
Client->>HTTPHandler: Sends HTTP headers
HTTPHandler->>PutStreamHeaders: Invokes TryFrom(HeaderMap)
PutStreamHeaders->>HeaderParser: parse_custom_partition & parse_time_partition_limit
alt Successful parsing
HeaderParser-->>PutStreamHeaders: Validated partition values
PutStreamHeaders-->>HTTPHandler: Constructed PutStreamHeaders object
else Parsing error
HeaderParser-->>PutStreamHeaders: Return HeaderParseError
PutStreamHeaders-->>HTTPHandler: Error response (BAD_REQUEST)
end
sequenceDiagram
participant Client
participant ParseableModule
participant StreamModule
Client->>ParseableModule: create_stream(..., custom_partitions: [values])
ParseableModule->>StreamModule: set_custom_partitions([values])
StreamModule->>StreamModule: Call prepare_parquet / convert_disk_files_to_parquet
StreamModule-->>ParseableModule: Processed stream data
ParseableModule-->>Client: Stream creation/update result
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
src/handlers/http/modal/utils/logstream_utils.rs (3)
50-52
: Use the type alias consistently for custom_partitions.Lines 50–52 define
pub custom_partitions: Vec<String>
inPutStreamHeaders
despite introducing aCustomPartition
type alias earlier (line 35). To maintain consistency and clarity, consider usingVec<CustomPartition>
here.pub struct PutStreamHeaders { pub time_partition: Option<FieldName>, pub time_partition_limit: Option<NonZeroU32>, - pub custom_partitions: Vec<String>, + pub custom_partitions: Vec<CustomPartition>, pub static_schema_flag: bool, pub update_stream_flag: bool, pub stream_type: StreamType,
102-114
: Trim spaces when parsing custom partitions.The
parse_custom_partition
function splits on commas but does not trim potential whitespace around each partition key, which may lead to unexpected strings like" partition"
if headers contain extra spaces.pub fn parse_custom_partition( custom_partition: &str, ) -> Result<Vec<CustomPartition>, HeaderParseError> { - let custom_partition_list = custom_partition - .split(',') - .map(String::from) - .collect::<Vec<String>>(); + let custom_partition_list = custom_partition + .split(',') + .map(|s| s.trim().to_string()) + .collect::<Vec<CustomPartition>>(); if custom_partition_list.len() > 3 { return Err(HeaderParseError::TooManyPartitions); } Ok(custom_partition_list) }
119-125
: Clarify error message for invalid time partition limit.Currently, the
ZeroOrNegative
variant is triggered for any parsing error, including non-numeric inputs (e.g.,"abc"
). Consider renaming the variant or separating these conditions to avoid confusion.src/parseable/mod.rs (2)
326-326
: Rename local variable to reflect multiple custom partitions.The local variable
custom_partition
is receiving a vector (stream_metadata.custom_partitions
). Rename it tocustom_partitions
for consistency with the new plural field name.- let custom_partition = stream_metadata.custom_partitions; + let custom_partitions = stream_metadata.custom_partitions;
460-464
: Avoid usingexpect("Is Some")
when verifying membership.Lines 460–464 check that
time_partition.is_some()
and then calltime_partition.as_ref().expect("Is Some")
. While logically safe, replacingexpect
with a pattern matching or early return approach can improve readability and avoid panic-oriented code paths.-if time_partition.is_some() - && !custom_partitions.is_empty() - && custom_partitions.contains(time_partition.as_ref().expect("Is Some")) -{ +if let Some(tp) = time_partition { + if !custom_partitions.is_empty() && custom_partitions.contains(tp) { + return Err(CreateStreamError::Custom { + msg: format!("time partition {tp:?} cannot be set as custom partition"), + status: StatusCode::BAD_REQUEST, + } + .into()); + } }src/parseable/streams.rs (1)
558-564
: Consider returning references or slices to avoid cloning.Currently, both
get_custom_partitions
(line 558) andset_custom_partitions
(line 644) clone or replace the entire vector of partitions. This is functional and likely not performance-critical for small vectors, but if the partitions list scales, consider returning references or using interior mutability constructs to optimize repeated lookups or partial updates.Also applies to: 644-646
src/storage/object_storage.rs (1)
657-681
: Consider using const for magic numbers in file suffix replacement.The hardcoded value
3
in the file suffix replacement logic could be made more maintainable by using a named constant.+ const BASE_SUFFIX_REPLACEMENTS: usize = 3; - file_suffix = str::replacen(filename, ".", "/", 3); + file_suffix = str::replacen(filename, ".", "/", BASE_SUFFIX_REPLACEMENTS); if !custom_partitions.is_empty() { - file_suffix = str::replacen(filename, ".", "/", 3 + custom_partitions.len()); + file_suffix = str::replacen(filename, ".", "/", BASE_SUFFIX_REPLACEMENTS + custom_partitions.len()); }src/utils/json/mod.rs (1)
148-194
: Well-implemented backward compatibility for custom partitions.The new serialization/deserialization implementations handle the conversion between comma-separated strings and vector format, ensuring backward compatibility with existing data.
A few suggestions to enhance the implementation:
- Consider adding input validation in
serialize_custom_partitions
to ensure no partition contains commas- Add documentation about the format requirements and limitations
pub fn serialize_custom_partitions<S>(value: &[String], serializer: S) -> Result<S::Ok, S::Error> where S: serde::Serializer, { if value.is_empty() { // Skip serializing this field serializer.serialize_none() } else { + // Validate that no partition contains commas to prevent serialization issues + if value.iter().any(|s| s.contains(',')) { + return Err(S::Error::custom("Custom partitions cannot contain commas")); + } serializer.serialize_str(&value.join(",")) } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
src/handlers/http/ingest.rs
(3 hunks)src/handlers/http/logstream.rs
(7 hunks)src/handlers/http/modal/utils/ingest_utils.rs
(4 hunks)src/handlers/http/modal/utils/logstream_utils.rs
(3 hunks)src/metadata.rs
(3 hunks)src/migration/mod.rs
(2 hunks)src/parseable/mod.rs
(11 hunks)src/parseable/streams.rs
(11 hunks)src/static_schema.rs
(2 hunks)src/storage/mod.rs
(4 hunks)src/storage/object_storage.rs
(3 hunks)src/utils/json/flatten.rs
(7 hunks)src/utils/json/mod.rs
(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
🔇 Additional comments (24)
src/parseable/streams.rs (2)
307-307
: LGTM for retrieving custom partitions separately.Extracting
custom_partitions
in line 307 before parquet preparation is a clear and concise approach.
403-414
: Good implementation of custom partition encoding and sorting.Looping over each partition to apply DELTA_BYTE_ARRAY encoding and add sorting columns is logically correct and aligns with the multi-partition design.
src/static_schema.rs (4)
61-62
: LGTM! Parameter types updated for better type safety.The parameter types have been improved:
time_partition: &Option<String>
better represents the optional nature of time partitionscustom_partitions: &[String]
better represents multiple custom partitions
70-78
: LGTM! Improved custom partition validation.The validation of custom partitions has been streamlined:
- Direct iteration through custom partitions is more efficient
- Early return with descriptive error message improves error handling
81-84
: LGTM! Enhanced time partition check.The use of
is_some_and
improves readability and makes the code more idiomatic.
119-123
: LGTM! Improved error handling for time partition.The error message now includes the specific time partition field name, making debugging easier.
src/metadata.rs (2)
86-86
: LGTM! Updated field type to support multiple custom partitions.The field type has been changed from
Option<String>
toVec<String>
to support multiple custom partitions.
97-99
: LGTM! Simplified parameter handling innew
method.The method signature and initialization have been updated to:
- Accept
Option<String>
directly for time partition- Accept
Vec<String>
for custom partitionsAlso applies to: 112-114
src/handlers/http/modal/utils/ingest_utils.rs (2)
74-74
: LGTM! Improved handling of custom partitions.The changes enhance clarity and consistency:
- Variable name updated to reflect multiple partitions
- Condition improved to check for non-empty custom partitions
- Parameter updated to pass custom partitions slice
Also applies to: 77-77, 82-82
156-159
: LGTM! Enhanced parameter type inget_custom_partition_values
.The parameter type has been updated to
&[String]
to better represent multiple custom partitions.src/storage/mod.rs (2)
110-115
: LGTM! Added custom serialization for multiple custom partitions.The changes enhance the handling of custom partitions:
- Field type updated to
Vec<String>
- Custom serialization and deserialization added for compatibility
229-229
: LGTM! Updated default initialization.The default initialization has been updated to use an empty vector for custom partitions.
src/handlers/http/ingest.rs (2)
511-519
: LGTM! Parameter update aligns with custom partitions refactor.The change from
None
to&[]
correctly reflects the new multiple custom partitions support, maintaining test coverage while adapting to the API changes.
708-717
: LGTM! Consistent parameter update for array tests.The change from
None
to&[]
is consistently applied here as well, ensuring test coverage for array handling with the new custom partitions implementation.src/storage/object_storage.rs (1)
179-194
: LGTM! Efficient handling of custom partitions.The changes improve efficiency through:
- Early return for empty partitions
- Direct slice assignment instead of optional single value
- Clear plural naming reflecting multiple partitions support
src/utils/json/mod.rs (2)
28-30
: LGTM! Clean import organization.The imports are well-organized and specifically import the required custom partition utilities.
38-46
: Function signature change improves type safety.The change from
Option<&String>
to&[String]
forcustom_partitions
parameter is a good improvement as it:
- Removes the need for
Option
since an empty slice can represent no partitions- Explicitly shows that multiple partitions are supported
- Maintains memory efficiency by using a slice instead of owning the data
src/migration/mod.rs (2)
275-276
: LGTM! Consistent field renaming.The field renaming from
custom_partition
tocustom_partitions
in the struct destructuring is consistent with the broader changes.
310-311
: LGTM! Consistent struct initialization.The field usage in the
LogStreamMetadata
struct initialization is consistent with the field renaming.src/utils/json/flatten.rs (2)
95-98
: Function signature change improves validation capabilities.The change from
Option<&String>
to&[String]
forcustom_partitions
parameter allows validating multiple partitions while maintaining the same validation rules.
99-134
: Validation logic handles multiple partitions efficiently.The validation logic now iterates through each partition while maintaining all the necessary checks for:
- Null values
- Empty values
- Objects
- Arrays
- Periods in values
src/handlers/http/logstream.rs (3)
366-367
: LGTM! Consistent field usage.The field usage in
StreamInfo
is consistent with the broader changes to support multiple partitions.
558-560
: Improved error handling for header parsing.The addition of the
HeaderParsing
error variant with appropriate error message improves error handling and user feedback.
632-634
: LGTM! Safer header parsing in tests.The change from
.into()
to.try_into().unwrap()
makes the potential failure points more explicit in the test code.
…bles
Fixes #XXXX.
Description
This PR has:
Summary by CodeRabbit
New Features
Improved Error Handling
Refactor