Setting TCP_QUICKACK reduces trylock latency by 40ms. #254

madars · 2024-04-05T06:39:01Z

When running PArSEC on Linux (e.g. Debian bookworm on aarch64), there is a 40ms delay between requesting a lock and obtaining a response.

For example, in a 1 agent, 1 shard localhost configuration lua_bench with 2 wallets produces the following trace for the agent:

[2024-04-05 02:16:17.990] [INFO ] using sha2:  standard
[2024-04-05 02:16:17.990] [INFO ] Connecting to shards...
[2024-04-05 02:16:17.990] [INFO ] Connected to shards, connecting to ticketer...
[2024-04-05 02:16:17.990] [INFO ] Requesting broker recovery...
[2024-04-05 02:16:17.991] [TRACE] 0xaaaae9d42170 Broker handled get_tickets for shard 0
[2024-04-05 02:16:18.026] [INFO ] Agent running
[2024-04-05 02:16:32.009] [TRACE] received request with m_function  7061790100000000000000  and param size  144
[2024-04-05 02:16:32.010] [TRACE] received request with m_function  7061790000000000000000  and param size  144
[2024-04-05 02:16:32.012] [TRACE] do_start  7061790100000000000000
[2024-04-05 02:16:32.012] [TRACE] do_start  7061790000000000000000
[2024-04-05 02:16:32.056] [TRACE] 0xaaaae9d42170 Broker locked key for 1000
(snipped)

In the shard log the relevant lines are:

[2024-04-05 02:16:32.055] [TRACE] 1000 requesting lock on 7061790100000000000000 0
[2024-04-05 02:16:32.056] [TRACE] Assigning read lock on 7061790100000000000000 to 1000
[2024-04-05 02:16:32.056] [TRACE] 0xffffb4001580 shard handled try_lock for 1000

Note that agent starts executing the contract at 02:16:32.012 but shard processes its first trylock whole 43ms later - 02:16:32.055, all while the communication is entirely done over a loopback interface. Using tcpdump -i lo to inspect the local traffic one sees the following sequence of packets for the same timestamp range (shard is listening on port 5556):

No.	Time	Source	Destination	Protocol	Length	Info
110	2024-04-05 02:16:32.012472	::1	::1	TCP	94	47948 → 5556 [PSH, ACK] Seq=1 Ack=1 Win=512 Len=8 TSval=956576386 TSecr=956562365
111	2024-04-05 02:16:32.055574	::1	::1	TCP	86	5556 → 47948 [ACK] Seq=1 Ack=9 Win=512 Len=0 TSval=956576429 TSecr=956576386
112	2024-04-05 02:16:32.055611	::1	::1	TCP	186	47948 → 5556 [PSH, ACK] Seq=9 Ack=1 Win=512 Len=100 TSval=956576429 TSecr=956576429

That is, there is an agent->shard TCP packet that only gets ACK'd by the shard 43ms later. However after that ACK, the shard's trylock response is almost immediate.

I believe this comes from a delayed ACK feature in Linux where ACKs get briefly delayed to coalesce sending of ACK receipts and improve throughput (at expense of latency). According to https://topic.alibabacloud.com/a/linux-tcp-delay-confirmation_1_16_31848681.html, the default delay is exactly 40ms, suspiciously close to number observed above. This hypothesis is further strengthened by the fact that this PR (which enables TCP_QUICKACK) improves 5 wallet lua_bench tx submission rate by around 100x (from ~15tps to ~1500tps).

As suggested by netty/netty#13610, this needs to be done after each recv - in fact, doing it only at the end of tcp_socket::receive (and once when setting up the socket) only improved the tx submission rate by 2x (to ~28tps)).

I think that in our setting enabling TCP_NODELAY (which disables coalescing on the sending side) also makes sense for our protocol but I did not see significant performance changes, so maybe that can be omitted. (The best option for our trylock interface would probably be doing each trylock request and response inside a single writev or TCP_CORK.) For now, this patch does not do any error handling and does not gate setsockopt to platforms where TCP_QUICKACK is defined (I guess, only Linux). How should this be improved?

anders94

Looks reasonable - would want to check on the TCP_NODELAY setting.

anders94 · 2024-04-05T11:23:46Z

src/util/network/tcp_socket.cpp

@@ -36,6 +37,10 @@ namespace cbdc::network {
                continue;
            }

+            static constexpr int one = 1;
+            setsockopt(m_sock_fd, IPPROTO_TCP, TCP_NODELAY, &one, sizeof(one)); // sending side. needed?


We should check this.

maurermi

Initial T-ACK. This works and greatly speeds up small scale tests on Linux, fixing an issue observed where small scale tests were performing dramatically better on macos.

We should consider making this a config option at build time. Maybe this also requires header guards to only include on Linux builds

HalosGhost · 2024-04-19T15:15:36Z

Two failing CI checks need to be addressed.

Set TCP_QUICKACK (and also TCP_NODELAY).

5b45369

madars requested a review from maurermi April 5, 2024 06:39

anders94 reviewed Apr 5, 2024

View reviewed changes

maurermi reviewed Apr 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting TCP_QUICKACK reduces trylock latency by 40ms. #254

Setting TCP_QUICKACK reduces trylock latency by 40ms. #254

madars commented Apr 5, 2024 •

edited

Loading

anders94 left a comment

anders94 Apr 5, 2024 •

edited

Loading

maurermi left a comment

HalosGhost commented Apr 19, 2024

Setting TCP_QUICKACK reduces trylock latency by 40ms. #254

Are you sure you want to change the base?

Setting TCP_QUICKACK reduces trylock latency by 40ms. #254

Conversation

madars commented Apr 5, 2024 • edited Loading

anders94 left a comment

Choose a reason for hiding this comment

anders94 Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

maurermi left a comment

Choose a reason for hiding this comment

HalosGhost commented Apr 19, 2024

madars commented Apr 5, 2024 •

edited

Loading

anders94 Apr 5, 2024 •

edited

Loading