feat(transaction): Use single hop in squashing when possible #2376

dranikpg · 2024-01-04T18:22:18Z

Idea:

If we determine that squashed commands form only a single batch, we can use ScheduleSingleHop() for that batch, which:

Reduces hops from 3 to 2 (no unlock)
Enables "quick" runs for single shard cases

Without contended keys it gives no improvements 😞, but with a contended it's potentially up to two times faster 🙂

dranikpg · 2024-01-05T12:07:03Z

src/server/multi_command_squasher.cc

+  // If all commands fit into a single batch, run them as a real single hop without multi overhead.
+  // Doesn't work with replication, so disallow if inline scheduling is not allowed.
+  bool singlehop_possible = IsAtomic() && !tx->IsScheduled() && cmds_.empty();
+  if (singlehop_possible && ServerState::tlocal()->AllowInlineScheduling()) {


this is not safe obviously... making this stuff work with replication is a little cumbersome

dranikpg · 2024-01-07T12:31:00Z

There are two options to running a single hop: either keep the base tx multi or not. With single-shard EVAL we decided to reduce it to a regular transaction, which could have even been done much easier by just removing the multi flag after proper initialization.

With MULTI/EXEC we have to handle replication, mainly sending the closing EXEC journal message with shard count, etc. If we decide to stick to non-multi single hops, we have to emulate it ourselves.

Instead, I changed the logic, so that if Schedule/ScheduleSingleHop will be called on a multi that hasn't scheduled yet, it'll do so according to the rules of Schedule or SSH. What is more, if we call SSH, we'll set the concluding bit, so UnlockMulti's job will be done when concluding execution, reducing the number of hops from 3 to 2 for multi-shard cases

chakaz

Sorry, a few clarification questions :)

chakaz · 2024-01-08T06:26:06Z

src/server/transaction.h

@@ -209,7 +209,8 @@ class Transaction {
  void StartMultiGlobal(DbIndex dbid);

  // Start multi in LOCK_AHEAD mode with given keys.
-  void StartMultiLockedAhead(DbIndex dbid, CmdArgList keys);
+  // Scheduling can be optionally disabled to allow more fine-grained control.
+  void StartMultiLockedAhead(DbIndex dbid, CmdArgList keys, bool skip_scheduling = false);


I'd use an enum instead of boolean argument here. kSkipScheduling at the call site is more readable and less error prone (for example, I'd assume that the meaning would be schedule rather than skip_scheduling)

chakaz · 2024-01-08T06:27:14Z

src/server/main_service.cc

@@ -1721,7 +1721,7 @@ optional<bool> StartMultiEval(DbIndex dbid, CmdArgList keys, ScriptMgr::ScriptPa
      trans->StartMultiGlobal(dbid);
      return true;
    case Transaction::LOCK_AHEAD:
-      trans->StartMultiLockedAhead(dbid, keys);
+      trans->StartMultiLockedAhead(dbid, keys, true);
      return true;


You're returning true here to indicate that the transaction was scheduled, but you pass true to StartMultiLockedAhead() to skip scheduling, is this intentional?

The function as a whole determines whether the script needs to schedule at all (not if no keys are present) and schedules. What I care about is whether it needs to schedule. if it already scheduled can be be looked up on the transaction itself

chakaz · 2024-01-08T06:33:26Z

src/server/main_service.cc

+  // If script runs on a single shard, we run it remotely to save hops
+  if (!tx->IsScheduled() && tx->GetMultiMode() == Transaction::LOCK_AHEAD &&
+      tx->GetUniqueShardCnt() == 1) {
+    DCHECK(*scheduled);  // because tx multi mode is lock ahead


I don't follow the meaning of *scheduled == true while !tx->IsScheduled() - what does it mean?

It means that we need to schedule but didn't yet, will update the names

chakaz · 2024-01-08T06:51:35Z

src/server/main_service.cc

  if (*multi_mode != Transaction::NOT_DETERMINED) {
-    StartMultiExec(cntx->db_index(), cntx->transaction, &exec_info, *multi_mode);
+    StartMultiExec(cntx->db_index(), cntx->transaction, &exec_info, *multi_mode, delay_scheduling);
    scheduled = true;


This might depend on the value of delay_scheduling, no?
I.e., shouldn't you do something like scheduled = StartMultiExec(...);?

chakaz · 2024-01-08T06:59:16Z

src/server/multi_command_squasher.cc

+
+  auto check_cb = [this](ShardId sid) { return !sharded_[sid].cmds.empty(); };
+  tx->PrepareSquashedMultiHop(base_cid_, check_cb);
+  tx->ScheduleSingleHop(run_cb);


I'm confused - how can we do ScheduleSingleHop here if we determined that a single hop is not possible?

Because we scheduled above and this is no longer a single hop for the whole transaction, but rather just a multi call

worth adding a comment why SSH here will perform multiple hops

chakaz · 2024-01-08T07:00:42Z

src/server/transaction.cc


-  if (multi_->role != SQUASHED_STUB)  // stub transactions don't migrate between threads
+  // stub transactions don't migrate between threads, so keep it's index cached


Suggested change

// stub transactions don't migrate between threads, so keep it's index cached

// stub transactions don't migrate between threads, so keep its index cached

dranikpg · 2024-01-08T11:07:17Z

Still needs some polishment, I also need to check the replica side more in detail

romange · 2024-02-02T22:20:51Z

src/server/main_service.cc

-      args.async = false;
-      CallFromScript(cntx, args);
-    });
+  bool embedded = tx->GetMultiMode() != Transaction::NOT_DETERMINED;


rename embedded to was_undetermined and reverse the condition here and below

was_undetermined doesn't explain why it should be so, whether it's embedded or not is what we care about and we check it by multi mode determinedness

romange

it's quite large change with lots of conditions. Do you think it is possible to split it into smaller ones?
I do not see new tests besides that small unit test. We know for sure that we have a memory leak during the replication. If this PR fixes it, can you add a (py)test reproducing the slave leak and showing it's been fixed here?

dranikpg · 2024-02-03T07:02:30Z

it's quite large change with lots of conditions. Do you think it is possible to split it into smaller ones?

I can split the core logic and then it's uses (eval, exec and squasher), but there is no way to test each part without the full changes. Besides they're all in separate places, so it's not the depth of changes, but the range, each part is independent

I do not see new tests besides that small unit test. We know for sure that we have a memory leak during the replication. If this PR fixes it, can you add a (py)test reproducing the slave leak and showing it's been fixed here?

"memory leak"? Our current "fix" disables multi mode, so we just don't send the replication report. I will add a test

adiholden · 2024-02-04T07:35:02Z

src/server/multi_test.cc

-
+// If all commands fit into a single batch, it can be run as a single hop, without separate hops for
+// locking and unlocking
+TEST_F(MultiTest, MultiSingleHop) {
  auto fb0 = pp_->at(0)->LaunchFiber([&] {
    for (unsigned i = 0; i < 100; i++) {
      Run({"multi"});


I would add another call to Run({"rpush", "a", "bar"}); inside the multi transaction here and check after execution size of a to check that both commands are executed

adiholden · 2024-02-04T07:54:16Z

src/server/transaction.cc

-  if (slot_id.has_value()) {
-    unique_slot_checker_.Add(*slot_id);
-  }
+  MultiUpdateWithParent(parent);


unused param slot_id passed to constructor

Signed-off-by: Vladislav Oleshko <[email protected]>

fix: more fixes Signed-off-by: Vladislav Oleshko <[email protected]>

dranikpg · 2024-03-08T10:09:20Z

I wanted to simplify journaling so that stub transactions report directly to the main transaction, this way we don't need both ReportWritesSquashedMulti and FIX_ConcludeJournalExec at all, but with a single hop, we can't know the full journal number when the first shard concludes 😞

romange · 2024-04-03T20:30:24Z

consider rebasing

dranikpg · 2024-04-03T20:33:25Z

Will do. This PR will become much easier because we don't need to use ScheduleSingleHop as it's no longer special

dranikpg force-pushed the tx-opt branch 2 times, most recently from d3374b9 to e704c04 Compare January 4, 2024 18:53

dranikpg requested a review from chakaz January 5, 2024 05:50

romange self-requested a review January 5, 2024 06:37

dranikpg commented Jan 5, 2024

View reviewed changes

dranikpg force-pushed the tx-opt branch from ad41c3a to e017b69 Compare January 7, 2024 12:11

dranikpg force-pushed the tx-opt branch from 38500ad to 2d5a5ce Compare January 7, 2024 15:32

chakaz reviewed Jan 8, 2024

View reviewed changes

dranikpg force-pushed the tx-opt branch 2 times, most recently from 86794b0 to d160489 Compare January 8, 2024 10:38

dranikpg marked this pull request as ready for review January 8, 2024 11:06

dranikpg closed this Jan 8, 2024

dranikpg reopened this Jan 8, 2024

dranikpg mentioned this pull request Jan 9, 2024

chore(transaction): Avoid COORD_SCHED_EXEC ambiguity with multi transactions #2392

Merged

dranikpg force-pushed the tx-opt branch 3 times, most recently from 9760893 to 3228c04 Compare January 31, 2024 10:16

dranikpg mentioned this pull request Feb 1, 2024

Eval on a single shard does not log exec to journal #2519

Closed

romange reviewed Feb 2, 2024

View reviewed changes

adiholden reviewed Feb 4, 2024

View reviewed changes

dranikpg force-pushed the tx-opt branch from bc52151 to 088229e Compare February 4, 2024 08:37

dranikpg mentioned this pull request Feb 14, 2024

chore(transaction): Untie scheduling from multi status #2590

Merged

dranikpg added 3 commits March 7, 2024 15:37

feat(treansaction): Use single hop in squashing when possible

d61f880

Signed-off-by: Vladislav Oleshko <[email protected]>

fix: replication fixes

9867c35

fix: more fixes Signed-off-by: Vladislav Oleshko <[email protected]>

fix: merge fixes

7a77db2

dranikpg force-pushed the tx-opt branch from 088229e to 7a77db2 Compare March 7, 2024 12:50

dranikpg closed this Jul 25, 2024

dranikpg deleted the tx-opt branch July 25, 2024 12:59

dranikpg restored the tx-opt branch July 25, 2024 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(transaction): Use single hop in squashing when possible #2376

feat(transaction): Use single hop in squashing when possible #2376

dranikpg commented Jan 4, 2024 •

edited

Loading

dranikpg Jan 5, 2024

dranikpg commented Jan 7, 2024

chakaz left a comment

chakaz Jan 8, 2024

chakaz Jan 8, 2024

dranikpg Jan 8, 2024

chakaz Jan 8, 2024

dranikpg Jan 8, 2024

chakaz Jan 8, 2024

chakaz Jan 8, 2024

dranikpg Jan 8, 2024

romange Feb 2, 2024

chakaz Jan 8, 2024

dranikpg commented Jan 8, 2024

romange Feb 2, 2024

dranikpg Feb 3, 2024

romange left a comment

dranikpg commented Feb 3, 2024

adiholden Feb 4, 2024

adiholden Feb 4, 2024

dranikpg commented Mar 8, 2024

romange commented Apr 3, 2024

dranikpg commented Apr 3, 2024


		if (multi_->role != SQUASHED_STUB) // stub transactions don't migrate between threads
		// stub transactions don't migrate between threads, so keep it's index cached

feat(transaction): Use single hop in squashing when possible #2376

feat(transaction): Use single hop in squashing when possible #2376

Conversation

dranikpg commented Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

dranikpg commented Jan 7, 2024

chakaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dranikpg commented Jan 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romange left a comment

Choose a reason for hiding this comment

dranikpg commented Feb 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dranikpg commented Mar 8, 2024

romange commented Apr 3, 2024

dranikpg commented Apr 3, 2024

dranikpg commented Jan 4, 2024 •

edited

Loading