Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use system's maximum concurrent TCP connections for net::Defaults.maxTCPConnections #10375

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sgothel
Copy link

@sgothel sgothel commented Oct 30, 2024

Summary

net::Defaults.maxTCPConnections: Use system's maximum concurrent TCP connections, disable if undefined or below threshold

Used system value on a Linux kernel are

Value is memory bound. On GNU/Linux approximately 4 concurrent TCP connections per MB system memory are provided, e.g.

  • 4096M -> 16384
  • 16384M -> 65536
  • 65407M -> 262144

Checklist

  • I have run make prettier-write and formatted the code.
  • All commits have Change-Id
  • I have run tests with make check
  • I have issued make run and manually verified that everything looks okay
  • Documentation (manuals or wiki) has been updated or is not required

@sgothel sgothel added the 24.04 label Oct 30, 2024
@sgothel sgothel self-assigned this Oct 30, 2024
@sgothel sgothel requested a review from caolanm October 30, 2024 11:49
@sgothel sgothel force-pushed the private/sgothel/socketcode_repair_3 branch 7 times, most recently from 68933f6 to 92d8955 Compare November 1, 2024 10:39
@sgothel
Copy link
Author

sgothel commented Nov 1, 2024

desktop cypress calc/searchbar_spec.js post rebase on a2f3536, while PR #10370 passed. Hence it got better .. but not fully resolved yet.

@sgothel sgothel force-pushed the private/sgothel/socketcode_repair_3 branch from 92d8955 to 7597fb3 Compare November 1, 2024 12:30
@caolanm
Copy link
Contributor

caolanm commented Nov 1, 2024

I feel readDecimal and dataToDecimal are a bit ornate if we compare to say getTotalSystemMemoryKb or getFromCGroup which just open their /proc file, read a line from it and atoll on that

@sgothel
Copy link
Author

sgothel commented Nov 1, 2024

I feel readDecimal and dataToDecimal are a bit ornate if we compare to say getTotalSystemMemoryKb or getFromCGroup which just open their /proc file, read a line from it and atoll on that

looked at them ofc and thought about adding this func .. esp due to late reviews :)
problem, the used atoll (edit: existing code) is not covering all error cases and I wanted a one liner to fetch the value
to have the getTechValue func clean of these things - encapsulation.
If its a blocker, I will revise.

@sgothel sgothel force-pushed the private/sgothel/socketcode_repair_3 branch from 7597fb3 to 342435e Compare November 1, 2024 14:20
@sgothel
Copy link
Author

sgothel commented Nov 1, 2024

added a tiny fix on calling readDecimal (EOS included), refined its API doc .. kicking CI again (ole cypress failure)

Copy link
Contributor

@mmeeks mmeeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some good things here that can go in - but a number that need feedback. I -really- don't want review getting lost and ignored in monster review threads of giant collections of commits. It is good to split them up - lets also push them in small batches to different PRs.

checkRemoval -> fine.

Simplify UnitTimeoutBase -> no idea - a huge change with no clear rationale.

Add 'UT' -> the 'UT' is not needed - lets not make up lots of new acronyms in commit messages it is implicit in UnitTestFoo - otherwise good to see a new test -> fine.

is "Remove WebSocketHandler's WS Ping timeout functionality" a nearly straight revert - if so good; lets include the word 'Revert' in the commit message - otherwise fine.

Like the Remove redundant 'Socket::isClosed()' - but I would prefer us to remove 'isOpen' to minimize the end-to-end diff across the code. We don't want non-functional renames of methods isClosed -> isOpen is not a win.

"Add safe ..." don't like creating big, redundant helpers like this. We already have similar code, using other helpers that does this - git grep for & re-use the same code there.

why tcp_max_orphans ?

Thanks!

@sgothel sgothel force-pushed the private/sgothel/socketcode_repair_3 branch 3 times, most recently from a3b53eb to dd4b6ae Compare November 6, 2024 15:11
@sgothel
Copy link
Author

sgothel commented Nov 6, 2024

Simplify UnitTimeoutBase -> no idea - a huge change with no clear rationale.

Not huge IMHO, but simplifying the unit test more to its point, which pattern is also more reusable for other users.

Add 'UT' -> the 'UT' is not needed - lets not make up lots of new acronyms in commit messages it is implicit in UnitTestFoo - otherwise good to see a new test -> fine.

Fun fact, I read this abbreviation somewhere else .. and picked it up. OK.

Like the Remove redundant 'Socket::isClosed()' - but I would prefer us to remove 'isOpen' to minimize the end-to-end diff across the code. We don't want non-functional renames of methods isClosed -> isOpen is not a win.

I counted usage of both, and isOpen was used more often without inversion
while isClosed required inversion - so this is more straight, a win.

"Add safe ..." don't like creating big, redundant helpers like this. We already have similar code, using other helpers that does this - git grep for & re-use the same code there.

std::stroll used here covers all error cases, while existing code does not (using atoll).
Initial motivation to factor it out of the actual evaluation function was to not
have it littered with ordinary and distracting I/O and conversion details.

(other items were either accepted or discussed in upcoming PRs)

Copy link
Contributor

@mmeeks mmeeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When feedback is incorporated systematically please poke me to review again. Thanks.

@sgothel sgothel force-pushed the private/sgothel/socketcode_repair_3 branch 4 times, most recently from 88b0d10 to 2764bcd Compare November 8, 2024 13:23
@sgothel sgothel changed the title Revise Handle Limited Open Connections (2) Use system's maximum concurrent TCP connections for net::Defaults.maxTCPConnections Nov 8, 2024
@sgothel
Copy link
Author

sgothel commented Nov 11, 2024

When feedback is incorporated systematically please poke me to review again. Thanks.

Done.

Remark: The requested removal of Socket::isOpen() instead of Socket::isClose() increased the diff as laid out earlier. Regardless, I hope it fits requirements as I also changed the field name to have the accessor match 1:1 w/o inverted logic.

Other changes applied as requested as well.

@sgothel sgothel requested a review from mmeeks November 11, 2024 15:23
@caolanm caolanm force-pushed the private/sgothel/socketcode_repair_3 branch from 2764bcd to e97b802 Compare November 11, 2024 19:26
common/Util-desktop.cpp Fixed Show resolved Hide resolved
@sgothel sgothel force-pushed the private/sgothel/socketcode_repair_3 branch from e97b802 to d95881e Compare November 11, 2024 23:03
@sgothel
Copy link
Author

sgothel commented Nov 11, 2024

Cleaned up branch rebased tree, i.e. PR #10375 -> #10455 -> #10456

(also refined getMaxConcurrentTCPConnections less nonsense analytics scaring comparison and borderline unsigned long long > numeric_limits<size_t>::max() check)

Copy link
Contributor

@mmeeks mmeeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets get the first two commits in- but not the third - which needs re-working.

common/Util-desktop.cpp Outdated Show resolved Hide resolved
common/Util-desktop.cpp Outdated Show resolved Hide resolved
common/Util-desktop.cpp Show resolved Hide resolved
common/Util-desktop.cpp Show resolved Hide resolved
common/Util-mobile.cpp Outdated Show resolved Hide resolved
net/Socket.cpp Show resolved Hide resolved
wsd/COOLWSD.cpp Show resolved Hide resolved
common/Util-desktop.cpp Outdated Show resolved Hide resolved
};
// - 4 tcp_max_orphans sockets per MB
// - {4096M -> 16384}, {16384M -> 65536}, {65407M -> 262144}, ...
const ssize_t tcp_max_orphans = readDecimal("/proc/sys/net/ipv4/tcp_max_orphans", 0); // ignored if n/a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really not sure that these are the right settings; or that we can interrogate the kernel to find a number here. Possibly nf_conntrack_max is something reasonable to check - but ... looking - it seems the kernel is not helpful in giving us a number here; so presuambly we have to get it from the configuration - and then perhaps just crop that to nf_conntrack_max and/or some algorithm based on file-max.

Please re-think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tcp_max_orphans is that memory bound freely available socket count, not getting reset.

This limit exists only to prevent simple denial-of-service attacks. Lowering this limit is not recommended. Network conditions might require you to increase the number of orphans allowed, but note that each orphan can eat up to ~64 kB of unswappable memory. The default initial value is set equal to the kernel parameter NR_FILE. This initial default is adjusted depending on the memory in the system.

While more sockets can be used, it is a good indicator of a worry free socket count IMHO at application launch.
nf_conntrack_max is usually set to tcp_max_orphans by default, if the iptables conntrack module is loaded.

Here emphasis should be on concurrent_tcp_sockets and whether we like to use this metric to limit external connections in the first place, I understand this.

It may make sense to also inject an alternative max_tcp_connections COOLWSD configuration value.
However cropping such value to nf_conntrack_max makes little sense, i.e. it implies the admin has to adjust the nf_conntrack_max and (probably) tcp_max_orphans limits anyways.
Hence we could use a config COOLWSD max_tcp_connections, use it as-is (if given) but warn if the system getMaxConcurrentTCPConnections is lower (-> for the system admin)?

Now if we look at the memory based Linux defaults here, i.e. 4 sockets/MB or 262k sockets for 64G etc,
it looks to me that these are good values. Whenever this issue is discussed in the web server space,
it is said to increase the same system limits to achieve a higher concurrent responsiveness while
(need to) maintain the DoS balance - if that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setting is some random other number - for a different use-case. It may often be initialized to something sensible - but it's really for a different purpose AFAICS.

So - fine; lets do this guess-timate thing - but we need to provide a setting to override it, in the case that someone really wants to configure their orphans in some way - and not crop the number of clients connections we can support.

Sven Göthel added 2 commits November 13, 2024 14:33
Signed-off-by: Sven Göthel <[email protected]>
Change-Id: Iad74f253bdac5636757b130b299b5deacda658db
…connections, disable if undefined or below threshold

Used system value on a Linux kernel are
- /proc/sys/net/ipv4/tcp_max_orphans
  See https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html
- /proc/sys/net/nf_conntrack_max
  See https://www.kernel.org/doc/html/latest/networking/nf_conntrack-sysctl.html

Value is memory bound. On GNU/Linux approximately 4 concurrent TCP connections per MB system memory are provided,
e.g. {4096M -> 16384}, {16384M -> 65536}, {65407M -> 262144}, ...

Signed-off-by: Sven Göthel <[email protected]>
Change-Id: Iad74f253bdac5636757b130b299b5deacda658db
@sgothel sgothel force-pushed the private/sgothel/socketcode_repair_3 branch from d95881e to 0175d07 Compare November 13, 2024 13:42
Copy link
Contributor

@mmeeks mmeeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more work to do - as well as splitting from the underlying un-necessary extra commit.

common/Util-desktop.cpp Show resolved Hide resolved
char* endptr = nullptr;
errno = 0; // Flush previous error indicator. Reminder: errno is thread-local
const unsigned long long num = std::strtoull(line, &endptr, 10);
if (0 != errno || nullptr == endptr || endptr == line || num > std::numeric_limits<size_t>::max())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 != errno to errno != 0 - as previously discussed. We use !ptr - ~everywhere for null pointer checking & readability - as above - please use that idiom - and above twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way - I don't believe any of this error detection is helpful; if you read systemd you'll see they use sscanf() to parse /proc output - the format and it content is well defined and must be backwards compatible. Lets use stroll as is done in the functions above; and lets split this out into its own readDecimal function instead of having this in-line function - but lets give it a name that doesn't suck - and fits into the existing naming convention - I believe I asked for that before.

};
// - 4 tcp_max_orphans sockets per MB
// - {4096M -> 16384}, {16384M -> 65536}, {65407M -> 262144}, ...
const ssize_t tcp_max_orphans = readDecimal("/proc/sys/net/ipv4/tcp_max_orphans", 0); // ignored if n/a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setting is some random other number - for a different use-case. It may often be initialized to something sensible - but it's really for a different purpose AFAICS.

So - fine; lets do this guess-timate thing - but we need to provide a setting to override it, in the case that someone really wants to configure their orphans in some way - and not crop the number of clients connections we can support.

<< res_min
<< ", cut-off-min " << cutOffMinimum
<< ") -> " << res_clipped);
return res_clipped;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it is very likely that this function returns zero - is that expected ?

The cutOffMinimum and the way it is injected here is really non-intuitive. Can you add a one-line comments against each line of your ? operator logic above to explain what is going on. We have two metaphores for the same thing 'cut-off' and 'clipped' - please choose something consistent for the variable naming to give the next poor reader a clue.

Copy link
Contributor

@mmeeks mmeeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes but lets get them done first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Handle limited open Connections due to keepalive connections (cool#9621)
3 participants