allow verifier log size to reach max to prevent infinite loop #1679

brycekahle · 2025-02-12T21:07:46Z

In the case of:

needing the max verifier log size
kernel does not support log_true_size yet

Lines 453 to 455 in 29fedc5

    
           const factor = 2 
        
           logSize = internal.Between(logSize, minVerifierLogSize, maxVerifierLogSize/factor) 
        
           logSize *= factor

maxVerifierLogSize is 1073741823, so integer divided by 2 = 536870911. Multiplying by 2 equals 1073741822. Thus we never reach maxVerifierLogSize in the loop. The kernel keeps returning ENOSPC but the size never changes and you get an infinite loop.

Signed-off-by: Bryce Kahle <[email protected]>

brycekahle · 2025-02-13T00:12:46Z

If it wasn't clear, we are seeing the infinite loop in one of our tools.

lmb · 2025-02-13T10:38:56Z

prog.go

-		const factor = 2
-		logSize = internal.Between(logSize, minVerifierLogSize, maxVerifierLogSize/factor)
-		logSize *= factor
+		logSize = internal.Between(logSize*2, minVerifierLogSize, maxVerifierLogSize)


Can you bring back the overflow check that used to be here?

e8b05c5#diff-348ab86651d6b4babd4e62cbe504d85db37f9aea172e2b72a570b8d5d73b8bc5L443-L445

I don't think it is possible to overflow? maxVerifierLogSize is 2^30, so 2^31 is the max it can be, which fits in a uint32.

Good point. There are two problems with the current code:

The maxVerifierLogSize may change at some point, as the comment points out. It'd be nice to not have this problem again.

There is no protection against an infinite loop, as you've discovered.

As I wrote the code originally the intention was that:

Passing a too large number for the buffer would give EINVAL, aborting the loop.

The buffer exceeding uint32 would also abort it.

(1) doesn't happen anymore since we clamp to the maximum, (2) was removed. IMO I'd prefer bringing back the original code, which doesn't clamp to a max value (because hardcoding other system's limit makes code brittle) and checks for overflow. We can keep the LogSizeStart logic, although that might need adjustment because with LogLevel == 0 we actually use LogSizeStart * 2 for the first call.

Updated, please take a look.

Actually, I thought about this some more. Having the knowledge about the kernel max allows us to work up to that value before erroring with EINVAL. If we just keep doubling, then we could blow right by a value that would actually function.

Maybe we should do that only for older kernels where log_true_size is not available?

Signed-off-by: Bryce Kahle <[email protected]>

ti-mo · 2025-02-18T08:23:30Z

Ugh, the amount of time I've spent on these 30 lines of code over the years is getting ridiculous.. Discussed this offline with Lorenz, we came up with the following steps:

Roll back the LogSizeStart field (again..) in favor of a LogSizeOverride field that disables the retry logic and does a one-shot load. Downside is that we don't have VerifierError.Truncated anymore, so it's not as clear for the user if the log is incomplete. We need an escape hatch anyway, though.
Extract the retry decision-making logic into a function that can be tested independently without making the buffer allocations to catch silly mistakes like this rounding error.
Limit the loop to 32 retries (1 for each bit in the LogSize field) to prevent this from going nuclear.

That should, hopefully, make this a bit more robust. I'll come up with something this week.

ti-mo · 2025-02-19T10:41:10Z

@brycekahle Something I'm wondering about, though. How is 1073741822 too small but 1073741823 big enough? I'm not sure if reaching the max log size is important here, it's that the lib doesn't currently give up if the log doesn't fit in the kernel's max log size at all. Interesting case!

brycekahle · 2025-02-19T19:51:12Z

I'm not sure if reaching the max log size is important here

I believe it is, because if you use the max size the kernel will not error, even if it actually needed more space.

ti-mo · 2025-02-20T09:37:48Z

@brycekahle I just put up #1693, please give it a try. Closing this one for now.

ti-mo · 2025-02-20T09:40:19Z

I'm not sure if reaching the max log size is important here

I believe it is, because if you use the max size the kernel will not error, even if it actually needed more space.

Hmm, interesting. It's a shame this is so hard to test due to the memory requirements and having to produce a program large and complex enough to trigger the limit.. Anyway, I tried covering as many cases as possible in my PR, please give it a try.

brycekahle requested a review from a team as a code owner February 12, 2025 21:07

allow verifier log size to reach max

bc35014

Signed-off-by: Bryce Kahle <[email protected]>

brycekahle force-pushed the bryce.kahle/verifier-log-max branch from 4fccb08 to bc35014 Compare February 12, 2025 21:09

brycekahle mentioned this pull request Feb 13, 2025

update ebpf-manager to v0.7.9 DataDog/datadog-agent#33897

Open

lmb reviewed Feb 13, 2025

View reviewed changes

remove maxVerifierLogSize

544a72a

Signed-off-by: Bryce Kahle <[email protected]>

brycekahle force-pushed the bryce.kahle/verifier-log-max branch from d3e0ac3 to 544a72a Compare February 14, 2025 22:25

ti-mo mentioned this pull request Feb 20, 2025

prog: avoid verifier loop of death #1693

Open

ti-mo closed this Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow verifier log size to reach max to prevent infinite loop #1679

allow verifier log size to reach max to prevent infinite loop #1679

brycekahle commented Feb 12, 2025

brycekahle commented Feb 13, 2025

lmb Feb 13, 2025

brycekahle Feb 13, 2025

lmb Feb 14, 2025

brycekahle Feb 14, 2025

brycekahle Feb 14, 2025

brycekahle Feb 14, 2025

ti-mo commented Feb 18, 2025 •

edited

Loading

ti-mo commented Feb 19, 2025

brycekahle commented Feb 19, 2025

ti-mo commented Feb 20, 2025

ti-mo commented Feb 20, 2025

	const factor = 2
	logSize = internal.Between(logSize, minVerifierLogSize, maxVerifierLogSize/factor)
	logSize *= factor

allow verifier log size to reach max to prevent infinite loop #1679

allow verifier log size to reach max to prevent infinite loop #1679

Conversation

brycekahle commented Feb 12, 2025

brycekahle commented Feb 13, 2025

lmb Feb 13, 2025

Choose a reason for hiding this comment

brycekahle Feb 13, 2025

Choose a reason for hiding this comment

lmb Feb 14, 2025

Choose a reason for hiding this comment

brycekahle Feb 14, 2025

Choose a reason for hiding this comment

brycekahle Feb 14, 2025

Choose a reason for hiding this comment

brycekahle Feb 14, 2025

Choose a reason for hiding this comment

ti-mo commented Feb 18, 2025 • edited Loading

ti-mo commented Feb 19, 2025

brycekahle commented Feb 19, 2025

ti-mo commented Feb 20, 2025

ti-mo commented Feb 20, 2025

ti-mo commented Feb 18, 2025 •

edited

Loading