memory: Decay large allocation warning threshold #2967

xemul · 2025-09-05T15:20:27Z

After being bumped, the threshold remains high and doesn't capture more
potential problems. The patch splits the threshold into two values --
check and warn, and once the allocation size hits the check part it
either warns it and increases the warn part, or doesn't warn and checks
if the warn threshold can be decreased. The latter happens based on the
lowres_clock timestamps, the next decrease is allowed only after 10
seconds from the prevuous one, the decrease step is page-size.

fixes: #669

travisdowns · 2025-09-06T21:02:57Z

This is a good idea, we've discussed same internally.

Copilot

Pull Request Overview

This PR implements a decay mechanism for the large allocation warning threshold to improve detection of subsequent large allocations. After the threshold is bumped due to a large allocation warning, it now gradually decreases back to the original configured value over time.

Adds timer-based threshold decay that reduces the warning threshold by page size every 10 seconds
Introduces base threshold tracking to remember the originally configured value
Updates related functions to properly handle the new decay mechanism

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-11T02:16:26Z

src/core/memory.cc

    pages[nr_pages].free = false;
    free_span_unaligned(reserved, nr_pages - reserved);
    live_cpus[cpu_id].store(true, std::memory_order_relaxed);
+    large_allocation_warning_decay.set_callback(default_scheduling_group(), [] { internal::decrease_large_allocation_warning_threshold(); });


The lambda callback captures nothing but calls get_cpu_mem() which may access different CPU-local data if the timer fires on a different CPU than where it was initialized. Consider capturing the current CPU's memory context or ensuring the timer runs on the same CPU.

Suggested change

large_allocation_warning_decay.set_callback(default_scheduling_group(), [] { internal::decrease_large_allocation_warning_threshold(); });

large_allocation_warning_decay.set_callback(default_scheduling_group(), [this] { this->decrease_large_allocation_warning_threshold(); });

Timers don't migrate between CPUs in seastar

Copilot · 2025-09-11T02:16:26Z

src/core/memory.cc

+
+    if (threshold > base + cpu_pages::large_allocation_warning_decay_step) {
+        threshold -= cpu_pages::large_allocation_warning_decay_step;
+        get_cpu_mem().large_allocation_warning_decay.arm(cpu_pages::large_allocation_warning_decay_period);


Using relative time with arm() instead of absolute time with rearm() may cause timing drift if the callback execution is delayed. Consider using rearm(lowres_clock::now() + large_allocation_warning_decay_period) for consistent intervals.

Suggested change

get_cpu_mem().large_allocation_warning_decay.arm(cpu_pages::large_allocation_warning_decay_period);

get_cpu_mem().large_allocation_warning_decay.rearm(

lowres_clock::now() + cpu_pages::large_allocation_warning_decay_period

);

If timer execution delays, "now()" will also be delayed
Also, seastar timers' relative arms are overloads on top of absolute that do exactly the same anyway
And finally -- we don't care about potential delays, this decay is not about precision timing

avikivity · 2025-09-11T12:26:32Z

src/core/memory.cc

+    size_t large_allocation_warning_threshold_base = std::numeric_limits<size_t>::max();
+    static constexpr size_t large_allocation_warning_decay_step = page_size;
+    static constexpr lowres_clock::duration large_allocation_warning_decay_period = std::chrono::seconds(10);
+    timer<lowres_clock> large_allocation_warning_decay;


memory.cc is much lower level than the reactor etc.

I'd like to avoid entangling it with the rest of the reactor.

I'd even prefer to avoid having it talk with the clocks, but I don't see a way.

How about this:

if larger than original threshold) if larger than current threshold report consider increasing threshold based on history else consider decreasing threshold based on history

If considering without clocks, then what's the ... average (?) expected (?) typical (?) allocations-per-second rate to rely on?

large allocation rate per second is much smaller than one.

large allocation rate per second is much smaller than one.

OK, let's take this as an axiom

xemul · 2025-09-29T12:42:01Z

upd:

do threshold decreasing based on lowres_clock timestamps

xemul · 2025-10-13T10:57:25Z

@avikivity , please re-review

avikivity · 2025-10-13T12:00:51Z

Sorry about the delay. Maybe I should use a timer.

Even lowres_clock is too entangled. memory.cc depends on nothing now, let's keep it that way.

Decay can be every N large allocations that aren't greater than the current threshold. If N = 20, and large allocations are rare, that throttles large allocations significantly but still allows them to surface.

After being bumped, the threshold remains high and doesn't capture more potential problems. The patch splits the threshold into two values -- check and warn, and once the allocation size hits the check part it either warns it and increases the warn part, or doesn't warn and checks if the warn threshold can be decreased. The latter happens based on the lowres_clock timestamps, the next decrease is allowed only after 10 seconds from the prevuous one, the decrease step is page-size. fixes: scylladb#669 Signed-off-by: Pavel Emelyanov <[email protected]>

xemul · 2025-10-15T06:33:25Z

upd:

descend the warn threshold after check threshold is hit several times

xemul requested a review from Copilot September 11, 2025 02:15

Copilot AI reviewed Sep 11, 2025

View reviewed changes

avikivity reviewed Sep 11, 2025

View reviewed changes

xemul force-pushed the br-memory-decay-large-allocation-warning-threshold branch from 7ede025 to 9edb2f9 Compare September 29, 2025 12:41

xemul requested a review from avikivity September 29, 2025 13:51

xemul force-pushed the master branch from 5b52717 to 8549271 Compare October 10, 2025 08:26

xemul force-pushed the br-memory-decay-large-allocation-warning-threshold branch from 9edb2f9 to 8db864a Compare October 14, 2025 13:36

avikivity merged commit bd74b3f into scylladb:master Oct 15, 2025
16 checks passed

	large_allocation_warning_decay.set_callback(default_scheduling_group(), [] { internal::decrease_large_allocation_warning_threshold(); });
	large_allocation_warning_decay.set_callback(default_scheduling_group(), [this] { this->decrease_large_allocation_warning_threshold(); });

-        get_cpu_mem().large_allocation_warning_decay.arm(cpu_pages::large_allocation_warning_decay_period);
+        get_cpu_mem().large_allocation_warning_decay.rearm(
+            lowres_clock::now() + cpu_pages::large_allocation_warning_decay_period
+        );

memory: Decay large allocation warning threshold #2967

memory: Decay large allocation warning threshold #2967

Uh oh!

Conversation

xemul commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

travisdowns commented Sep 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

xemul Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

xemul Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

avikivity Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

avikivity Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

xemul Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

avikivity Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

xemul Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

xemul commented Sep 29, 2025

Uh oh!

xemul commented Oct 13, 2025

Uh oh!

avikivity commented Oct 13, 2025

Uh oh!

xemul commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xemul commented Sep 5, 2025 •

edited

Loading