-
Notifications
You must be signed in to change notification settings - Fork 413
disagg: Fine-grained rate limiting for S3 cache downloads #10637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disagg: Fine-grained rate limiting for S3 cache downloads #10637
Conversation
Signed-off-by: JaySon-Huang <[email protected]>
|
@coderabbitai review |
Signed-off-by: JaySon-Huang <[email protected]>
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughReplaced direct OFStream writes with a rate-limited, buffered download helper Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant FC as FileCache
participant S3 as S3 (GetObject)
participant Stream as Aws::IOStream
participant ReadBuf as ReadBufferFromIStream
participant WriteLim as PosixWritableFile + WriteLimiter
participant Temp as Temp File
participant Final as Final Cache File
FC->>S3: GetObject() -> Aws::IOStream
S3-->>FC: Aws::IOStream
FC->>ReadBuf: wrap Aws::IOStream
loop chunked transfer
ReadBuf->>WriteLim: read chunk
WriteLim->>Temp: write chunk (rate-limited)
end
WriteLim->>Temp: flush / fsync / close
FC->>Temp: rename(temp -> final)
FC-->>FC: setComplete / update metrics
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (1 warning, 2 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dbms/src/Storages/S3/FileCache.cpp
🧰 Additional context used
🪛 Clang (14.0.6)
dbms/src/Storages/S3/FileCache.cpp
[error] 989-989: parameter 'istr' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 990-990: 3 adjacent parameters of 'downloadToLocal' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 990-990: parameter 'fname' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 991-991: parameter 'content_length' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 992-992: parameter 'write_limiter' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 996-996: variable 'ofile' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 998-998: statement should be inside braces
(readability-braces-around-statements,-warnings-as-errors)
[error] 1002-1002: variable 'total_written' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 1004-1004: variable 'rbuf' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 1007-1007: variable 'count' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 1008-1008: if with identical then and else branches
(bugprone-branch-clone,-warnings-as-errors)
[error] 1008-1008: variable 'write_res' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🔇 Additional comments (4)
dbms/src/Storages/S3/FileCache.cpp (4)
24-24: LGTM! New includes support the buffered download implementation.The added headers provide the necessary functionality for implementing rate-limited, buffered writes to local files during S3 downloads.
Also applies to: 26-26, 35-35, 38-38, 42-42
1003-1004: Verify: Is the 16KB buffer size intentional for rate limiting granularity?The buffer size of 16KB seems smaller than typical S3 download buffers (which are often 64KB-1MB). However, given the PR's objective to implement fine-grained rate limiting to prevent I/O bursts, this smaller buffer size may be intentional to provide better rate-limiting granularity.
Could you confirm whether:
- The 16KB size is a deliberate choice for finer-grained rate limiting control?
- Performance testing shows this size provides adequate throughput while preventing bursts?
1052-1055: LGTM! Proper temp-file-then-rename pattern for atomic downloads.Downloading to a temporary file first and then renaming to the final destination is a good practice that ensures:
- Atomic file replacement (on most filesystems)
- Incomplete files are never visible to readers
- Proper cleanup on failure (temp files are cleaned up during restore)
1065-1071: LGTM! Appropriate log level for download completion.Changing the log level from DEBUG to INFO for successful download completion is appropriate, as this represents a significant operation that should be visible in standard operational logs for monitoring and troubleshooting.
Signed-off-by: JaySon-Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
dbms/src/Storages/S3/FileCache.cpp (2)
989-1008: Good refactor using buffered I/O with rate limiting.The new
downloadToLocalhelper correctly usescopyDatawhich handles partial writes and buffering internally, addressing the previous partial write concern. The rate limiting is properly applied throughPosixWritableFile.Two minor observations:
When
content_length == 0, the file is created butsync()is not called. For empty files this is benign, but for consistency you may want to sync in all cases.Consider adding braces around the early return for style consistency (per static analysis hint).
Optional: Add braces for style consistency
// simply create an empty file - if (unlikely(content_length <= 0)) + if (unlikely(content_length <= 0)) { return; + }
1041-1049: Consider removing redundant file size verification.
fsizeis read from the filesystem immediately after rename, then verified againstcontent_lengthwhich was already used to copy the exact number of bytes. SincecopyData(rbuf, wbuf, content_length)copies exactlycontent_lengthbytes andwbuf.sync()ensures it's flushed, this filesystem read is redundant overhead.Optional: Use content_length directly
downloadToLocal(result.GetBody(), temp_fname, content_length, write_limiter); std::filesystem::rename(temp_fname, local_fname); - auto fsize = std::filesystem::file_size(local_fname); - capacity_metrics->addUsedSize(local_fname, fsize); - RUNTIME_CHECK_MSG( - fsize == static_cast<UInt64>(content_length), - "local_fname={}, file_size={}, content_length={}", - local_fname, - fsize, - content_length); + capacity_metrics->addUsedSize(local_fname, content_length); file_seg->setStatus(FileSegment::Status::Complete);Alternatively, if the sanity check is intentionally defensive (e.g., against filesystem corruption), keeping it is reasonable—just note it adds I/O overhead.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dbms/src/Storages/S3/FileCache.cppdbms/src/Storages/S3/tests/gtest_filecache.cpp
🧰 Additional context used
🪛 Clang (14.0.6)
dbms/src/Storages/S3/FileCache.cpp
[error] 990-990: parameter 'istr' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 991-991: 3 adjacent parameters of 'downloadToLocal' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 991-991: parameter 'fname' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 992-992: parameter 'content_length' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 993-993: parameter 'write_limiter' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 999-999: statement should be inside braces
(readability-braces-around-statements,-warnings-as-errors)
[error] 1004-1004: variable 'rbuf' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 1005-1005: variable 'wbuf' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🔇 Additional comments (4)
dbms/src/Storages/S3/tests/gtest_filecache.cpp (2)
210-230: LGTM! Well-designed helper for precise cache capacity calculation.The iterative adjustment with the
forward && backwardassertion is a good defensive approach to prevent infinite oscillation when the target capacity is unreachable due to rounding. This ensures deterministic test behavior.
303-326: LGTM! Clear separation of evict-fail vs evict-success test paths.The reworked test flow with explicit
waitForBgDownloadcalls and clearer logging makes the eviction scenarios easier to follow and debug. Holdingfile_segpointers to mock in-progress reads is a valid approach to prevent eviction.dbms/src/Storages/S3/FileCache.cpp (2)
1036-1041: Good use of temp file + rename pattern for atomic file creation.This approach ensures that if a download is interrupted (crash, exception), the cache won't contain a partial/corrupted file. The existing restore logic already handles cleanup of
.tmpfiles (lines 1243-1246).
1051-1057: LGTM! Log level change to INFO is appropriate for download completion events.Making download completion visible at INFO level aligns with the PR objective of better observability for cache downloading behavior.
Signed-off-by: JaySon-Huang <[email protected]>
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
dbms/src/Storages/S3/FileCache.cpp (1)
989-1008: Rate-limited streaming download looks correct; minor style suggestion.The
downloadToLocalfunction correctly implements fine-grained rate limiting by:
- Creating
PosixWritableFilewith thewrite_limiter- Using buffered I/O via
ReadBufferFromIStreamandWriteBufferFromWritableFile- Calling
copyDatawhich respects the limiter during writes- Calling
sync()before returningThis achieves the PR objective of controlling instantaneous write throughput during streaming.
One minor observation: the static analysis warnings about "unused parameters" appear to be false positives since all parameters are clearly used.
Optional: Add braces around single-statement if body for consistency
if (unlikely(content_length <= 0)) + { return; + }
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
dbms/src/Storages/S3/FileCache.cppdbms/src/Storages/S3/FileCache.hdbms/src/Storages/S3/tests/gtest_filecache.cpp
🧰 Additional context used
🪛 Clang (14.0.6)
dbms/src/Storages/S3/FileCache.cpp
[error] 990-990: parameter 'istr' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 991-991: 3 adjacent parameters of 'downloadToLocal' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 991-991: parameter 'fname' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 992-992: parameter 'content_length' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 993-993: parameter 'write_limiter' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 999-999: statement should be inside braces
(readability-braces-around-statements,-warnings-as-errors)
[error] 1004-1004: variable 'rbuf' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 1005-1005: variable 'wbuf' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🔇 Additional comments (8)
dbms/src/Storages/S3/FileCache.h (2)
92-98: LGTM! Atomic completion signaling is well-designed.The
setCompletemethod correctly combines size update, status transition, and notification in a single locked section. This prevents race conditions where a waiter could observe a stale status after the size is updated.
108-112: LGTM!The
getStatus()getter properly acquires the lock before returning the status, ensuring thread-safe reads.dbms/src/Storages/S3/FileCache.cpp (3)
1030-1037: LGTM!The failure handling correctly logs the issue and sets the segment status to
Failed, which will notify any waiting threads viacv_ready.notify_all()insetStatus().
1041-1060: Atomic download pattern is correct; consider exception safety for rename.The temp-file-then-rename approach ensures atomicity—readers never see partial files. The sequence (download → fsync → rename → update metrics → mark complete) is correct.
However, if
std::filesystem::renamethrows (e.g., cross-filesystem move, permission issue), the temp file remains orphaned and capacity metrics are not updated, but the reserved space is still consumed. This is a minor edge case since rename within the same directory rarely fails.Consider whether a try-catch around rename with cleanup is warranted for robustness:
try { std::filesystem::rename(temp_fname, local_fname); } catch (...) { std::filesystem::remove(temp_fname); // cleanup orphan throw; }
1061-1067: LGTM!Good observability with cost metrics logged at INFO level for successful downloads.
dbms/src/Storages/S3/tests/gtest_filecache.cpp (3)
210-230: LGTM! Useful test helper with proper oscillation guard.The
calculateCacheCapacityhelper iteratively adjustsconfig.capacityto achieve the exact targetdt_size. TheASSERT_FALSE(forward && backward)correctly detects if the algorithm oscillates (which would indicate a bug in the delta_rate calculation logic).
303-357: LGTM! Well-structured eviction tests.The test cases clearly separate:
- Evict-fail path (lines 303-327): Holds
file_segreferences to simulate in-use files, preventing eviction- Evict-success path (lines 329-357): Allows eviction and verifies new files are cached
The added
waitForBgDownloadcalls ensure background operations complete before assertions, preventing flaky tests.
116-116: LGTM!Improved log format with elapsed seconds using
.3fprecision for better readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This WIP pull request implements fine-grained rate limiting for S3 cache downloads in TiFlash. The main goal is to provide a safer download flow using a temp-file-then-rename pattern to reduce corrupted or partial files while maintaining proper capacity accounting.
Key Changes:
- Refactored S3 download logic to use temp files with atomic rename operations
- Introduced
downloadToLocalhelper function with write rate limiting support - Updated logging from DEBUG to INFO level with improved timing precision
- Enhanced test coverage for cache eviction scenarios
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
dbms/src/Storages/S3/FileCache.cpp |
Implements new downloadToLocal function with rate limiting, refactors downloadImpl to use temp-file-then-rename pattern, improves logging and metrics |
dbms/src/Storages/S3/FileCache.h |
Renames setSize to setComplete to combine size setting and status update, relocates getStatus method for better organization |
dbms/src/Storages/S3/tests/gtest_filecache.cpp |
Refactors eviction test cases to properly test failure and success scenarios, improves logging detail and precision |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: JaySon-Huang <[email protected]>
Signed-off-by: JaySon-Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @dbms/src/Storages/S3/FileCache.cpp:
- Around line 989-1008: The downloadToLocal function uses a 128 KiB
MAX_BUFFER_SIZE and relies on WriteLimiter refill logic to control write rate;
update the function by adding a concise comment explaining that buffer size sets
I/O chunk granularity while WriteLimiter (with its refill period) enforces
overall throughput and prevents bursts, and note why 128 KiB was chosen
(tradeoff: system call overhead vs chunk size); validate rate-limiting through
unit/integration tests or metrics (reference GET_METRIC, WriteLimiter, copyData)
and document the validation approach; optionally reorder the downloadToLocal
parameters to group IO-related args (Aws::IOStream & istr and Int64
content_length together) to address the "easily swappable parameters" warning,
and add a code annotation or pragma to suppress the false-positive "parameter
unused" static analysis message for parameters used indirectly by constructors
(PosixWritableFile, ReadBufferFromIStream, WriteBufferFromWritableFile).
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dbms/src/Storages/S3/FileCache.cpptests/docker/next-gen-utils/.gitignore
✅ Files skipped from review due to trivial changes (1)
- tests/docker/next-gen-utils/.gitignore
🧰 Additional context used
🪛 Clang (14.0.6)
dbms/src/Storages/S3/FileCache.cpp
[error] 990-990: parameter 'istr' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 991-991: 3 adjacent parameters of 'downloadToLocal' of similar type are easily swapped by mistake
(bugprone-easily-swappable-parameters,-warnings-as-errors)
[error] 991-991: parameter 'fname' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 992-992: parameter 'content_length' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 993-993: parameter 'write_limiter' is unused
(misc-unused-parameters,-warnings-as-errors)
[error] 999-999: statement should be inside braces
(readability-braces-around-statements,-warnings-as-errors)
[error] 1004-1004: variable 'rbuf' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 1005-1005: variable 'wbuf' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🔇 Additional comments (5)
dbms/src/Storages/S3/FileCache.cpp (5)
24-24: LGTM! Includes support the new buffered streaming approach.The new includes are appropriate for implementing fine-grained rate limiting during S3 downloads using buffered I/O.
Also applies to: 26-27, 29-29, 37-37, 40-40
1028-1037: LGTM! Improved error handling for reservation failures.The change to log at INFO level and set the file segment status to
Failedprovides graceful degradation when space reservation fails. This is consistent with the error handling pattern used elsewhere in the download flow.
1041-1045: LGTM! Atomic download pattern prevents partial file corruption.The download-to-temp-then-rename pattern ensures that the cache never contains partially downloaded or corrupted files if the download fails mid-stream. This is a good practice for maintaining cache consistency.
1047-1056: LGTM! Debug-mode sanity check verifies download integrity.The file size validation in debug builds helps catch issues during development without impacting production performance.
1058-1067: LGTM! Proper finalization of successful downloads.The code correctly updates metrics, sets the file segment to complete with the actual content length, and logs the successful download with timing information.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CalvinNeo, JinheLin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherry-pick release-nextgen-20251011 |
|
@JaySon-Huang: new pull request created to branch DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
What problem does this PR solve?
Issue Number: close #10633, ref #10557
Problem Summary:
Currently, TiFlash Compute’s FileCache downloads data from S3 into local block storage as follows: it obtains an std::istream object via S3’s GetObject API and requests permission from the RateLimiter based on the total size of the entire S3 object before writing (implemented by #10555). Once the rate limiter approves the request, the full object is written to the local block storage in one burst.
Although this approach may keep the average write throughput below the quota threshold—making it appear compliant—the instantaneous write rate during the actual download can temporarily exceed the quota. This burstiness can trigger throttling by the underlying block storage, leading to elevated I/O latency.
What is changed and how it works?
Cloud vendors usually consider 16KiB as 1 IOPS
Check List
Tests
Performance changes
the read latency is also slightly reduced
Side effects
Documentation
Release note
Summary by CodeRabbit
Bug Fixes
Improvements
Tests
Public API
Chores
✏️ Tip: You can customize this high-level summary in your review settings.