Skip to content

Commit 93f799e

Browse files
committed
rfc: Resolved-ts for Large Transactions
Signed-off-by: ekexium <[email protected]>
1 parent c532d69 commit 93f799e

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Resolved-ts for Large Transactions
2+
3+
Author: @ekexium
4+
5+
Tracking issue: N/A
6+
7+
## Background
8+
9+
The RFC is a variation of @zhangjinpeng87 's [Large Transactions Don't Block Watermark](https://github.com/pingcap/tiflow/blob/master/docs/design/2024-01-22-ticdc-large-txn-not-block-wm.md). They aim to solve the same problem.
10+
11+
Resolved-ts is a tool for other services. It's definition is that no new commit records smaller than the resolved-ts will be observed after you observe the resolved-ts.
12+
13+
In current TiKV(v8.3), large transactions can block resolve-ts from advancing, because it is calculated as `min(pd-tso, min(lock.ts))`, which is actually a more stringent constraint than its original definition. A lock from a pipelined txn can live several hours. This will make services dependent on resolved-ts unavailable.
14+
15+
## Goals
16+
17+
Do not let **large pipelined transactions** block the advance of resolved-ts.
18+
19+
We focus on large pipelined transactions here. It could be adapted for general "large" transactions.
20+
21+
## Assumptions
22+
23+
We assume that the number of concurrent pipelined transactions is bounded, not exceeding 10000, for example.
24+
25+
This constraint is not a strict limit, but rather serves to manage resource utilization and facilitate performance evaluation. 10000 should be large enough in real world.
26+
27+
## Design
28+
29+
The key idea is using `lock.min_commit_ts` to calculate resolved-ts instead of `lock.start_ts`.
30+
31+
A resolved-ts guarantees that all historical events prior to this timestamp are finalized and observable. 'Historical events' in this context refer specifically to write records and rollback records, but explicitly exclude locks. It's important to note that the absence of locks with earlier timestamps is not a requirement for a valid resolved-ts, as long as the status of their corresponding transactions is definitively determined.
32+
33+
### Maintanence of resolved-ts
34+
35+
Key objective: Maximize TiKV nodes' awareness of large pipelined transactions, including:
36+
37+
1. start_ts
38+
2. Recent min_commit_ts
39+
3. Status
40+
41+
#### Coordinator
42+
43+
For a large pipelined transaction, its TTL manager is responsible for fetching a latest TSO as a candidate of min_commit_ts and update both the committer's inner state and PK in TiKV. After that, it broadcast the start_ts and the new min_commit_ts to all TiKV stores. The update of PK can be done within the heartbeat request.
44+
45+
Atomic variables or locks may be needed for synchronization between the TTL manager and the committer.
46+
47+
#### TiKV scheduler - heartbeat
48+
49+
Besides updating TTL, it can also update min_commit_ts of the PK.
50+
51+
*TBD: should it also update max_ts?*
52+
53+
#### TiKV txn_status_cache
54+
55+
A standalone part was created for large transactions specially. The cache serves as
56+
57+
1. A fresh enough source of min_commit_ts info of large transactions for resolved-ts resolver
58+
2. A fast path for read requests when they would otherwise return to coordinator to check PK's min_commit_ts.
59+
60+
##### Eviction
61+
62+
We would keep as much useful info as possible in the cache, and never evict any of them because of space issue. One entry only contains information like start_ts + min_commit_ts + status + TTL, which should make the cache small enough, considering our assumption of the number of ongoing large transactions.
63+
64+
There should be a large defaut TTL of these entries, because we want to save unnecessary efforts when some reader meets a lock belonging to these transactions.
65+
66+
After the successfully commiting all secondary locks of a large transaction, the coordinator explicitly broadcasts a TTL update to all TiKV nodes, extending it to several seconds later. Don't immediately evict the entry to give the follower peers some time to catch up with leader, otherwise a stale read may encounter a lock and miss the cache.
67+
68+
#### TiKV resolved-ts resolver
69+
70+
Resolver tracks normal locks as usual, but handles locks belonging to large pipelined transactions in a different way. The locks can be identified via the "generation" field.
71+
72+
For a lock belonging to a large pipelined transaction, the resolve only tracks its start_ts. When calculating resolved-ts, the resolver first tries to map the start_ts to its min_commit_ts by querying the txn_status_cache. If not found in cache, fallback to calculate using start_ts.
73+
74+
Upon observing a LOCK DELETION, the resolver ceases tracking the corresponding start_ts for large pipelined transactions. This is justified as lock deletion only occurs once a transaction's final state is determined.
75+
76+
### Compatibility
77+
78+
The key difference is that services can now observe locks. They need to handle the locks.
79+
80+
#### Stale read
81+
82+
When it meets a lock, first query the txn_status_cache. When not found in the cache, fallback to leader read.
83+
84+
#### Flashback
85+
86+
*TBD*
87+
88+
#### EBS snapshot backups
89+
90+
*TBD*
91+
92+
#### CDC
93+
94+
Already well documented in [Large Transactions Don't Block Watermark](https://github.com/pingcap/tiflow/blob/master/docs/design/2024-01-22-ticdc-large-txn-not-block-wm.md). Briefly, a refactoring work is needed.
95+
96+
### Cost
97+
98+
Memory: each cache entry takes at least 8(start_ts) + 8(min_commit_ts) + 1(status) + 8(TTL) = 33 bytes. Any TiKV instance can easily hold millions of such entries.
99+
100+
Latency: maintenance of resolved-ts requires extra work, but they can be asynchoronous, thus not affecting latency.
101+
102+
RPCs: each large transaction sends N more RPCs per second, where N is the number of TiKVs.
103+
104+
CPU: the mechanism may consume more CPU, but should be ignorable.
105+

0 commit comments

Comments
 (0)