-
Notifications
You must be signed in to change notification settings - Fork 1.6k
memory: Decay large allocation warning threshold #2967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory: Decay large allocation warning threshold #2967
Conversation
|
This is a good idea, we've discussed same internally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a decay mechanism for the large allocation warning threshold to improve detection of subsequent large allocations. After the threshold is bumped due to a large allocation warning, it now gradually decreases back to the original configured value over time.
- Adds timer-based threshold decay that reduces the warning threshold by page size every 10 seconds
- Introduces base threshold tracking to remember the originally configured value
- Updates related functions to properly handle the new decay mechanism
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/core/memory.cc
Outdated
| pages[nr_pages].free = false; | ||
| free_span_unaligned(reserved, nr_pages - reserved); | ||
| live_cpus[cpu_id].store(true, std::memory_order_relaxed); | ||
| large_allocation_warning_decay.set_callback(default_scheduling_group(), [] { internal::decrease_large_allocation_warning_threshold(); }); |
Copilot
AI
Sep 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lambda callback captures nothing but calls get_cpu_mem() which may access different CPU-local data if the timer fires on a different CPU than where it was initialized. Consider capturing the current CPU's memory context or ensuring the timer runs on the same CPU.
| large_allocation_warning_decay.set_callback(default_scheduling_group(), [] { internal::decrease_large_allocation_warning_threshold(); }); | |
| large_allocation_warning_decay.set_callback(default_scheduling_group(), [this] { this->decrease_large_allocation_warning_threshold(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timers don't migrate between CPUs in seastar
src/core/memory.cc
Outdated
|
|
||
| if (threshold > base + cpu_pages::large_allocation_warning_decay_step) { | ||
| threshold -= cpu_pages::large_allocation_warning_decay_step; | ||
| get_cpu_mem().large_allocation_warning_decay.arm(cpu_pages::large_allocation_warning_decay_period); |
Copilot
AI
Sep 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using relative time with arm() instead of absolute time with rearm() may cause timing drift if the callback execution is delayed. Consider using rearm(lowres_clock::now() + large_allocation_warning_decay_period) for consistent intervals.
| get_cpu_mem().large_allocation_warning_decay.arm(cpu_pages::large_allocation_warning_decay_period); | |
| get_cpu_mem().large_allocation_warning_decay.rearm( | |
| lowres_clock::now() + cpu_pages::large_allocation_warning_decay_period | |
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If timer execution delays, "now()" will also be delayed
Also, seastar timers' relative arms are overloads on top of absolute that do exactly the same anyway
And finally -- we don't care about potential delays, this decay is not about precision timing
src/core/memory.cc
Outdated
| size_t large_allocation_warning_threshold_base = std::numeric_limits<size_t>::max(); | ||
| static constexpr size_t large_allocation_warning_decay_step = page_size; | ||
| static constexpr lowres_clock::duration large_allocation_warning_decay_period = std::chrono::seconds(10); | ||
| timer<lowres_clock> large_allocation_warning_decay; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memory.cc is much lower level than the reactor etc.
I'd like to avoid entangling it with the rest of the reactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd even prefer to avoid having it talk with the clocks, but I don't see a way.
How about this:
if larger than original threshold)
if larger than current threshold
report
consider increasing threshold based on history
else
consider decreasing threshold based on history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If considering without clocks, then what's the ... average (?) expected (?) typical (?) allocations-per-second rate to rely on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
large allocation rate per second is much smaller than one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
large allocation rate per second is much smaller than one.
OK, let's take this as an axiom
7ede025 to
9edb2f9
Compare
|
upd:
|
|
@avikivity , please re-review |
|
Sorry about the delay. Maybe I should use a timer. Even lowres_clock is too entangled. memory.cc depends on nothing now, let's keep it that way. Decay can be every N large allocations that aren't greater than the current threshold. If N = 20, and large allocations are rare, that throttles large allocations significantly but still allows them to surface. |
After being bumped, the threshold remains high and doesn't capture more potential problems. The patch splits the threshold into two values -- check and warn, and once the allocation size hits the check part it either warns it and increases the warn part, or doesn't warn and checks if the warn threshold can be decreased. The latter happens based on the lowres_clock timestamps, the next decrease is allowed only after 10 seconds from the prevuous one, the decrease step is page-size. fixes: scylladb#669 Signed-off-by: Pavel Emelyanov <[email protected]>
9edb2f9 to
8db864a
Compare
|
upd:
|
After being bumped, the threshold remains high and doesn't capture more
potential problems. The patch splits the threshold into two values --
check and warn, and once the allocation size hits the check part it
either warns it and increases the warn part, or doesn't warn and checks
if the warn threshold can be decreased. The latter happens based on the
lowres_clock timestamps, the next decrease is allowed only after 10
seconds from the prevuous one, the decrease step is page-size.
fixes: #669