|
4 | 4 |
|
5 | 5 | When the execution of a transaction depends on the state of other participants, the participants exchange data using so-called ReadSets. These are persistent messages exchanged between participants that are delivered at least once and contain read results with the state of the transaction. The use of ReadSets causes transactions to go through additional phases: |
6 | 6 |
|
7 | | -1. **Reading phase**: The participant reads, persists, and sends data that is needed by other participants. During this phase, KQP transactions (type `TX_KIND_DATA`, which have a non-empty `TDataTransaction.KqpTransaction` field and subtype `KQP_TX_TYPE_DATA`) validate optimistic locks. Older MiniKQL transactions (type `TX_KIND_DATA`, which have a non-empty `TDataTransaction.MiniKQL` field) perform reads and send arbitrary table data during this phase. Another example of using the reading phase is the distributed TTL transaction for deleting expired rows. The primary shard generates a bitmask matching expired rows, ensuring that both the primary and index shards delete the same rows. |
| 7 | +1. **Reading phase**: The participant reads, persists, and sends data that is needed by other participants. During this phase, QP transactions (type `TX_KIND_DATA`, which have a non-empty `TDataTransaction.KqpTransaction` field and subtype `KQP_TX_TYPE_DATA`) validate optimistic locks. Older MiniKQL transactions (type `TX_KIND_DATA`, which have a non-empty `TDataTransaction.MiniKQL` field) perform reads and send arbitrary table data during this phase. Another example of using the reading phase is the distributed TTL transaction for deleting expired rows. The primary shard generates a bitmask matching expired rows, ensuring that both the primary and index shards delete the same rows. |
8 | 8 | 2. **Waiting phase**: The participant waits until it has received all the necessary data from the other participants. |
9 | 9 | 3. **Execution phase**: The participant uses both local and remote data to determine whether to abort or complete the transaction. The participant generates and applies the effects specified in the transaction body if the transaction is completed successfully. The transaction body typically includes a program that uses the same input data and leads all participants to come to the same conclusion. |
10 | 10 |
|
@@ -64,7 +64,7 @@ Transactions are stored on disk and in memory using the [TTransQueue](https://gi |
64 | 64 |
|
65 | 65 | Volatile transactions are stored in memory and are currently lost when DataShard restarts (they may be migrated during graceful restarts in the future). The restart aborts the transaction for all participants, and any participant can initiate the abort before the transaction body is executed and its effects are persisted. Shards use this feature to make schema and partitioning changes faster by aborting all pending transactions without waiting for a planning deadline. |
66 | 66 |
|
67 | | -The distributed transaction body needs to have enough information about the other participants so that each shard can know when it needs to generate and send outgoing ReadSets and which shards should expect and wait for incoming ReadSets. KQP transactions currently use ReadSets for validating and committing optimistic locks, which are described using [TKqpLocks](https://github.com/ydb-platform/ydb/blob/a833a4359ba77706f9b1fe4104741ef0acfbc83b/ydb/core/protos/data_events.proto#L18) generated by the `TKqpDataExecutor` actor. This message describes the following shard sets: |
| 67 | +The distributed transaction body needs to have enough information about the other participants so that each shard can know when it needs to generate and send outgoing ReadSets and which shards should expect and wait for incoming ReadSets. QP transactions currently use ReadSets for validating and committing optimistic locks, which are described using [TKqpLocks](https://github.com/ydb-platform/ydb/blob/a833a4359ba77706f9b1fe4104741ef0acfbc83b/ydb/core/protos/data_events.proto#L18) generated by the `TKqpDataExecutor` actor. This message describes the following shard sets: |
68 | 68 |
|
69 | 69 | * `SendingShards` are shards that send ReadSets to all shards in the `ReceivingShards` set. |
70 | 70 | * `ReceivingShards` are shards that expect ReadSets from all shards in the `SendingShards` set. |
@@ -93,13 +93,13 @@ DataShards handle `TEvTxProcessing::TEvPlanStep` events in the [TTxPlanStep](htt |
93 | 93 |
|
94 | 94 | ## Execution phase in the DataShard tablet |
95 | 95 |
|
96 | | -The [PlanQueue](https://github.com/ydb-platform/ydb/blob/3fa95b9777601584da35d5925d7908f283f671a9/ydb/core/tx/datashard/plan_queue_unit.cpp#L44) unit allows distributed transactions to start, subject to concurrency limits, in the increasing `(Step, TxId)` order. The transaction body is loaded from disk when needed (for evicted persistent transactions), KQP transactions [finalize execution plans](https://github.com/ydb-platform/ydb/blob/3fa95b9777601584da35d5925d7908f283f671a9/ydb/core/tx/datashard/datashard_active_transaction.cpp#L667) and arrive at the [BuildAndWaitDependencies](https://github.com/ydb-platform/ydb/blob/3fa95b9777601584da35d5925d7908f283f671a9/ydb/core/tx/datashard/build_and_wait_dependencies_unit.cpp#L72) unit. This unit analyzes transaction keys and ranges declared for reading and writing and may add dependencies on earlier conflicting transactions. For example, when transaction `A` writes to key `K` and a later transaction `B` reads from key `K`, then transaction `B` depends on transaction `A`, and transaction `B` cannot start until transaction `A` completes. Transactions leave `BuildAndWaitDependencies` when they no longer have direct dependencies on other transactions. |
| 96 | +The [PlanQueue](https://github.com/ydb-platform/ydb/blob/3fa95b9777601584da35d5925d7908f283f671a9/ydb/core/tx/datashard/plan_queue_unit.cpp#L44) unit allows distributed transactions to start, subject to concurrency limits, in the increasing `(Step, TxId)` order. The transaction body is loaded from disk when needed (for evicted persistent transactions), QP transactions [finalize execution plans](https://github.com/ydb-platform/ydb/blob/3fa95b9777601584da35d5925d7908f283f671a9/ydb/core/tx/datashard/datashard_active_transaction.cpp#L667) and arrive at the [BuildAndWaitDependencies](https://github.com/ydb-platform/ydb/blob/3fa95b9777601584da35d5925d7908f283f671a9/ydb/core/tx/datashard/build_and_wait_dependencies_unit.cpp#L72) unit. This unit analyzes transaction keys and ranges declared for reading and writing and may add dependencies on earlier conflicting transactions. For example, when transaction `A` writes to key `K` and a later transaction `B` reads from key `K`, then transaction `B` depends on transaction `A`, and transaction `B` cannot start until transaction `A` completes. Transactions leave `BuildAndWaitDependencies` when they no longer have direct dependencies on other transactions. |
97 | 97 |
|
98 | | -Next, persistent KQP transactions execute the read phase (which includes validating optimistic locks) and generate outgoing OutReadSets in the [BuildKqpDataTxOutRS](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/build_kqp_data_tx_out_rs_unit.cpp#L44) unit. Then, the [StoreAndSendOutRS](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/store_and_send_out_rs_unit.cpp#L42) persists outgoing ReadSets and access logs for optimistic locks. Optimistic locks that have attached uncommitted changes are [marked with the `Frozen` flag](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/store_and_send_out_rs_unit.cpp#L67), which prevents them from being aborted until the transaction completes. Otherwise, lock validity is ensured by assigning writes with a higher MVCC version and ensuring the correct execution order of conflicting transactions. Operations with access logs or outgoing ReadSets are [added to the `Incomplete` set](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/store_and_send_out_rs_unit.cpp#L79), which ensures that new writes can't change the validity of previous reads and generally need to use a higher MVCC version. However, new reads don't necessarily need to block on the outcome of the incomplete transaction and can use a lower MVCC version as long as it's consistent with other transactions. |
| 98 | +Next, persistent QP transactions execute the read phase (which includes validating optimistic locks) and generate outgoing OutReadSets in the [BuildKqpDataTxOutRS](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/build_kqp_data_tx_out_rs_unit.cpp#L44) unit. Then, the [StoreAndSendOutRS](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/store_and_send_out_rs_unit.cpp#L42) persists outgoing ReadSets and access logs for optimistic locks. Optimistic locks that have attached uncommitted changes are [marked with the `Frozen` flag](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/store_and_send_out_rs_unit.cpp#L67), which prevents them from being aborted until the transaction completes. Otherwise, lock validity is ensured by assigning writes with a higher MVCC version and ensuring the correct execution order of conflicting transactions. Operations with access logs or outgoing ReadSets are [added to the `Incomplete` set](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/store_and_send_out_rs_unit.cpp#L79), which ensures that new writes can't change the validity of previous reads and generally need to use a higher MVCC version. However, new reads don't necessarily need to block on the outcome of the incomplete transaction and can use a lower MVCC version as long as it's consistent with other transactions. |
99 | 99 |
|
100 | | -Persistent KQP transactions prepare data structures for incoming InReadSets in the [PrepareKqpDataTxInRS](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/prepare_kqp_data_tx_in_rs_unit.cpp#L31) unit and begin waiting for all necessary ReadSets from other participants in the [LoadAndWaitInRS](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/load_and_wait_in_rs_unit.cpp#L36) unit. In some cases, such as blind writes to multiple shards without lock validation, distributed transactions may not require the exchange of ReadSets, and the ReadSet-related units don't perform any actions in these scenarios. |
| 100 | +Persistent QP transactions prepare data structures for incoming InReadSets in the [PrepareKqpDataTxInRS](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/prepare_kqp_data_tx_in_rs_unit.cpp#L31) unit and begin waiting for all necessary ReadSets from other participants in the [LoadAndWaitInRS](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/load_and_wait_in_rs_unit.cpp#L36) unit. In some cases, such as blind writes to multiple shards without lock validation, distributed transactions may not require the exchange of ReadSets, and the ReadSet-related units don't perform any actions in these scenarios. |
101 | 101 |
|
102 | | -Finally, the KQP transaction operation reaches the [ExecuteKqpDataTx](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/execute_kqp_data_tx_unit.cpp#L59) unit. This unit validates local optimistic locks using previously persisted `AccessLog` data when available, validates ReadSets received from other participants, and, if everything checks out, executes the transaction body and returns a result. If lock validation fails locally or remotely, the transaction body is not executed, and the operation fails with an `ABORTED` status. |
| 102 | +Finally, the QP transaction operation reaches the [ExecuteKqpDataTx](https://github.com/ydb-platform/ydb/blob/6e0ff4ff86f8dd5ac589b83de59c8bb377fe38c5/ydb/core/tx/datashard/execute_kqp_data_tx_unit.cpp#L59) unit. This unit validates local optimistic locks using previously persisted `AccessLog` data when available, validates ReadSets received from other participants, and, if everything checks out, executes the transaction body and returns a result. If lock validation fails locally or remotely, the transaction body is not executed, and the operation fails with an `ABORTED` status. |
103 | 103 |
|
104 | 104 | ### Volatile transactions |
105 | 105 |
|
|
0 commit comments