Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txn: make the unique key lock behaviour consistent for all executors #22837

Open
cfzjywxk opened this issue Feb 20, 2021 · 18 comments
Open

txn: make the unique key lock behaviour consistent for all executors #22837

cfzjywxk opened this issue Feb 20, 2021 · 18 comments
Assignees
Labels
sig/transaction SIG:Transaction type/enhancement The issue or PR belongs to an enhancement.

Comments

@cfzjywxk
Copy link
Contributor

cfzjywxk commented Feb 20, 2021

Development Task

After #21483, we'll try to lock all the related unique keys for pessimistic transactions if the executor is pointGet or batchPointGet. For other coprocessor related executors, we should also implement this for them, and make the lock behaviours consistent.
Tasks are:

  • Try to lock all unique keys for existing data for all executors, make them behave the same.
  • Make a detailed and improved document about the pessimistic lock behaviours of TiDB.

Another possible solution is to push down the pessimistic lock operations into the kv requests doing data read, the difference behaviours could be resolved, check rfc for more details.

@cfzjywxk cfzjywxk added type/enhancement The issue or PR belongs to an enhancement. sig/transaction SIG:Transaction labels Feb 20, 2021
@zhaoxugang
Copy link
Contributor

I want to join to resolve this issue, what can i do for this issue?@cfzjywxk

@cfzjywxk
Copy link
Contributor Author

I want to join to resolve this issue, what can i do for this issue?@cfzjywxk

@zhaoxugang
I think maybe we need to to find out a way to lock the existing unique index keys for pessimistic transaction in selectLockExec executor, currently only row keys are locked.

@zhaoxugang
Copy link
Contributor

the query by unique index keys will be executed by PointGetExec ot BatchPointGetExec, so i don't know what can i do for resolve the issue@you06,@cfzjywxk

@you06
Copy link
Contributor

you06 commented May 8, 2021

the query by unique index keys will be executed by PointGetExec ot BatchPointGetExec, so i don't know what can i do for resolve the issue@you06,@cfzjywxk

In a short word, current PointGetExec and BatchPointGetExec will lock exist/unexist keys, however other executors will lock exist keys only. With the executor choice in planner module, it would be confusing for users.

eg.

-- id is a unique key, val is not
select * from t where id = 1 and val = 1; -- sql1, lock unexist key(id  = 1)
select * from t where val = 1; -- sql2, does not lock any keys if val = 1 not exist

Notice that sql2 takes a looser condition, it should cover the effects which sql1 makes. However, it does not lock any keys actually. Besides, the push-down requests scan need-lock keys in coprocessor in TiKV, there will be a great performance impact if we easily lock the query range for non-unique keys.

In a short word, this issue doesn't have a well design yet.

@zhaoxugang
Copy link
Contributor

got it,thx

@zhaoxugang
Copy link
Contributor

I have an idea , It is i choose which index to lock dependency access path of data source . If there not exists index that was choosing , i will lock the table .
@you06

@you06
Copy link
Contributor

you06 commented Jun 8, 2021

I have an idea , It is i choose which index to lock dependency access path of data source . If there not exists index that was choosing , i will lock the table .
@you06

Lock table sounds terrible, that will introduce a great performance impact.

@dwangxxx
Copy link
Contributor

the query by unique index keys will be executed by PointGetExec ot BatchPointGetExec, so i don't know what can i do for resolve the issue@you06,@cfzjywxk

In a short word, current PointGetExec and BatchPointGetExec will lock exist/unexist keys, however other executors will lock exist keys only. With the executor choice in planner module, it would be confusing for users.

eg.

-- id is a unique key, val is not
select * from t where id = 1 and val = 1; -- sql1, lock unexist key(id  = 1)
select * from t where val = 1; -- sql2, does not lock any keys if val = 1 not exist

Notice that sql2 takes a looser condition, it should cover the effects which sql1 makes. However, it does not lock any keys actually. Besides, the push-down requests scan need-lock keys in coprocessor in TiKV, there will be a great performance impact if we easily lock the query range for non-unique keys.

In a short word, this issue doesn't have a well design yet.

@you06
The execution plan of select * from t where id = 1 and v = 1 and val = 30 for update; is like this:
image

It will not lock any unexist keys. So in this situation, the task of this issue is to lock the unexist key(id = 1 and v = 1) in SelectLockExec or IndexLookUpExec to block other txn to update or insert at key(id = 1 and v = 1)?

@you06
Copy link
Contributor

you06 commented Jan 11, 2022

@dwangxxx It seems your case talks about the table like create table t(id int, v int, val int, unique index u(id ,v));.

I don't mean it needs to lock the unexist key when with predicate id = 1 and v = 1 and val = 30 with empty result. Because id = 1 and v = 1 is wider than id = 1 and v = 1 and val = 30, so id = 1 and v = 1 locks the unexist key and id = 1 and v = 1 and val = 30 does not lock the unexist key is ok to me(just my personal opinion).
The ideal model for me is supporting range lock, so that when using predicate v = 1 will lock more keys than id = 1 and v = 1.

@dwangxxx
Copy link
Contributor

@dwangxxx It seems your case talks about the table like create table t(id int, v int, val int, unique index u(id ,v));.

I don't mean it needs to lock the unexist key when with predicate id = 1 and v = 1 and val = 30 with empty result. Because id = 1 and v = 1 is wider than id = 1 and v = 1 and val = 30, so id = 1 and v = 1 locks the unexist key and id = 1 and v = 1 and val = 30 does not lock the unexist key is ok to me(just my personal opinion). The ideal model for me is supporting range lock, so that when using predicate v = 1 will lock more keys than id = 1 and v = 1.

@you06
So the ideal model you mentioned is like this?

/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 for update;  -- this will lock all keys id = 1
/* t2 */ insert into t values(1, 2, 3); -- this will be blocked until t1 commit

@you06
Copy link
Contributor

you06 commented Jan 11, 2022

@you06 So the ideal model you mentioned is like this?

/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 for update;  -- this will lock all keys id = 1
/* t2 */ insert into t values(1, 2, 3); -- this will be blocked until t1 commit

Yes.

@dwangxxx
Copy link
Contributor

@you06
I'm a little confused. How about the non-unique key? I list some situations, could you help figure out which should be blocked or not? Suppose table t is empty. Thanks~

  • test1
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 for update; 
/* t2 */ insert into t values(1, x, x); 
  • test2
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 and v = 1 for update; 
/* t2 */ insert into t values(1, 1, x); 
  • test3
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 and v = 1 and val = 3 for update; 
/* t2 */ insert into t values(1, 1, x); 
  • test4
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where val = 3 for update; 
/* t2 */ insert into t values(x, x, 3); 
  • test5
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 and val = 3 for update; 
/* t2 */ insert into t values(1, x, x); 

@you06
Copy link
Contributor

you06 commented Jan 11, 2022

@you06 I'm a little confused. How about the non-unique key? I list some situations, could you help figure out which should be blocked or not? Suppose table t is empty. Thanks~

Thanks for the cases. Suppose the only unique/primary key id (id, v) as you described before. There are some statements
request for cursors by predicate query and some request for item query(lookup one record by primary/unique index). Notice this is out of the isolation level discussion, the cursor is an assistant tool to avoid some anomalies.

You convince me to give up my standpoint.

The hypothesis from my personal perspective: If condition1 is stricter than condition2, it should take no more cursors(aka. locks) than condition2.

Here we go to the analysis:

  • test1
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 for update; 
/* t2 */ insert into t values(1, x, x); 

Block.

Because id = 1 and v = 1 takes the cursor on (1, 1), id = 1 certainly takes the cursor on (1, x). Unless it may lead to the lost update:

/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 for update; -- 0 rows
/* t2 */ insert into t values(1, 1, 1); -- success without block
/* t2 */ commit;
/* t1 */ update t set v = v + 1 where id = 1; -- 1 row affected
/* t1 */ commit;
  • test2

Block.

It should block by the cursor(lock) on the non-exist key.

  • test3
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 and v = 1 and val = 3 for update; 
/* t2 */ insert into t values(1, 1, x); 

Not block.

According to the hypothesis, the for-update statement in t1 can obtain fewer cursors than it in test2, so it's ok for this case not to lock (1, 1, x), but need to lock (1, 1, 3) to avoid lost update issue. We can also not lock (1, 1, 3) and claim that avoiding cursor lost update only works in item-type queries, don't discuss it here.

  • test4
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where val = 3 for update; 
/* t2 */ insert into t values(x, x, 3); 

Block.

test4 to test3 is similar with test1 to test2, val = 3 covers id = x and v = x and val = 3, which should fetch all cursors of (x, x, 3). Here the problem comes, it's almost impossible to implement it in a lock-based concurrency control system.

  • test5
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 and val = 3 for update; 
/* t2 */ insert into t values(1, x, x); 

Block.


If we take this model:

The cursor only works in index-related columns, i.e. there should be a unique index that is used to perform lock.

create table t(id int, v int, val int, unique index u(id ,v)); -- empty table
select * from t where id = 1 for update; -- lock index u with (1, x)
select * from t where v = 1 for update; -- lock index u with (x, 1)
select * from t where val = 1 for update; -- do not lock any index keys
  • test1
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 for update; 
/* t2 */ insert into t values(1, x, x); 

Block, id = 1 will lock (1, x);

  • test2

Block, same as TiDB's current behavior.

  • test3
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 and v = 1 and val = 3 for update; 
/* t2 */ insert into t values(1, 1, x); 

Block or not are both reasonable, same as you describe before, it can lock less data than id = 1 and v = 1.

  • test4
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where val = 3 for update; 
/* t2 */ insert into t values(x, x, 3); 

Not block, no index matches val = 3.

  • test5
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 and val = 3 for update; 
/* t2 */ insert into t values(1, x, x); 

Block, id = 1 can lock (1, x).

The difficulty of this strategy will be the implementation of range lock in the distributed systems.

Back to test3 which we discussed before, adding a lock for id = 1 and v = 1 will avoid this kind of lost update, will it fix any cases besides this?

/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 and v = 1 and val = 3 for update; -- empty
/* t2 */ insert into t values(1, 1, 1); -- success without block
/* t2 */ commit;
/* t1 */ update t set v = 10 where id = 1 and v = 1; -- 1 row affected
/* t1 */ commit;

@dwangxxx
Copy link
Contributor

@you06
Acctually, I can not think of any other situations we can avoid about test3 if we lock id = 1 and v = 1. Just in my opinion, if we block test5 in id = 1, we should also block test3. I think this way may make the lock behaviors more consistent.

And about the difficulty, it is because the need-lock keys may be distributed in different regions?

@you06
Copy link
Contributor

you06 commented Jan 11, 2022

@dwangxxx It's not need-lock keys, it's need-lock range. A range may be distributed in different regions which makes things complex and expensive.

@dwangxxx
Copy link
Contributor

dwangxxx commented Jan 11, 2022

@you06
Ok, I see.
But I have another question. If we lock range like we discussed before, I think the transaction concurrence may be affected. If txn1 lock a wide range(like id = 1), but it just needs to update one record, this may cause the bad affect that other transactions are blocked(suppose there are many transactions need to update about id = 1, like id = 1 and v = x, different transaction has different x).

@you06
Copy link
Contributor

you06 commented Jan 12, 2022

@dwangxxx You're right, 2 phase lock harms performance a lot. Do you mean we may not lock any keys or ranges with id = 1?

@dwangxxx
Copy link
Contributor

@dwangxxx You're right, 2 phase lock harms performance a lot. Do you mean we may not lock any keys or ranges with id = 1?

@you06
Maybe we should.
I run the above five tests on MySQL. All tests are blocked. And the following tests:

/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 for update; -- empty
/* t2 */ insert into t values(2, 2, 1); -- blocked
/* t1 */ begin pessimistic;
/* t2 */ begin pessimistic;
/* t1 */ select * from t where id = 1 and v = 1 for update; -- empty
/* t2 */ insert into t values(2, 2, 1); -- blocked

They are also blocked. So I don't know if we should be like MySQL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/transaction SIG:Transaction type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

4 participants