-
Notifications
You must be signed in to change notification settings - Fork 709
[CORE-15023] Cloud Storage Clients: Add util::find_multipart_boundary #29166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-15023] Cloud Storage Clients: Add util::find_multipart_boundary #29166
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR extracts common multipart boundary parsing logic into a new utility function find_multipart_boundary() to reduce code duplication between Azure Blob Storage (ABS) and Google Cloud Storage (GCS) implementations. The function parses the Content-Type header to extract the boundary string from multipart/mixed responses.
Key Changes
- Added
util::find_multipart_boundary()function to parse multipart boundary strings from HTTP response headers - Updated existing unit tests to use gmock matchers (
testing::HasSubstr) instead of manual string contains checks - Added comprehensive unit tests for the new boundary parsing function covering valid cases and error conditions
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/v/cloud_storage_clients/util.h | Declares new find_multipart_boundary() function |
| src/v/cloud_storage_clients/util.cc | Implements boundary parsing with proper error handling for missing headers, incorrect content types, and malformed boundaries |
| src/v/cloud_storage_clients/tests/util_test.cc | Refactors existing tests to use gmock matchers and adds new test cases for boundary parsing |
1c601fa to
b84eb38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
b84eb38 to
76bb416
Compare
This is well-specified and identical between ABS and GCS, so factor it out for easier testing. Signed-off-by: Oren Leiman <[email protected]>
Signed-off-by: Oren Leiman <[email protected]>
76bb416 to
84e1679
Compare
Retry command for Build#78637please wait until all jobs are finished before running the slash command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
|
/ci-repeat 1 |
| bool is_whitespace = (c == ' ' || c == '\t'); | ||
| if (!is_eq && !is_whitespace) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have "multipart/mixed; boundaryfffff", then boundary will be the substr fffff, and then c will be f and we'll break here, and return fffff as the boundary, but I would have expected that the lack of = would mean that it is malformed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boundarywill be the substringfffff... but the lack of=would mean that it is malformed
correct, but a few lines down we blow away the string view if we didn't find exactly one '=' before the first non-whitespace character. there is a unit test for this 🙂
This function could use some documentation. If the diff looks ok to you aside from that, I'll add a nice block comment in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeh makes sense i was thinking those checks were inside the while loop
dotnwat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. why are we returning a std::exception_ptr for std::unexpected case? it would seem like either throwing or returning a formatted error string and letting the caller throw would be better.
yeah fair, that sounds more normal. will change in follow up |
Find the multipart boundary string in a response's content-type header. This code is identical between ABS and GCS versions. Includes unit tests.
PR also improves surrounding unit tests slightly w/ use of gmock matchers.
Backports Required
Release Notes