Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceeded resource group quota limitation if request tokens exceeded 500ms #8349

Closed
nolouch opened this issue Jul 2, 2024 · 1 comment · Fixed by #8352 or #8368
Closed

Exceeded resource group quota limitation if request tokens exceeded 500ms #8349

nolouch opened this issue Jul 2, 2024 · 1 comment · Fixed by #8352 or #8368
Labels
affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.2 report/customer Customers have encountered this bug. severity/major type/bug The issue is confirmed as a bug.

Comments

@nolouch
Copy link
Contributor

nolouch commented Jul 2, 2024

Bug Report

What did you do?

User use resource control.

What did you expect to see?

no error report

What did you see instead?

meet exceed resource group quota limitation, but RU usage is below the RU settings.

image

image

image

Event 1 17:40:19.765 17:40:19.765 Coming a request and finding that the local tokens are insufficient, send a notification to the thread that acquires tokens.
Event 2 17:40:19.765 ~ 17:40:20.198 17:40:20.198 The request keep retrying, but the local tokens haven't refreshed yet, continuously logging the same events as before during this retry period. After retrying 500ms timeout,the report failed error to the applications.
Event 3 17:40:20.263 ~17:40:20.265 17:40:20.263 The thread responsible for fetching Tokens received the notification and started to send requests for tokens.17:40:20.265 Obtain new tokens authorization

See the above table. Theoretically, Event 1 should immediately trigger Event 3. After Event 3 succeeds, then enough tokens are obtained during the retry period of Event 2, the request can continue. However, it is possible that since the current event-driven system is similar to a single-threaded event loop, in some cases, the processing delay of a certain message exceeds 500 ms, leading to a failure in obtaining tokens and resulting in an error.

What version of PD are you using (pd-server -V)?

7.5.2

@nolouch nolouch added the type/bug The issue is confirmed as a bug. label Jul 2, 2024
@nolouch nolouch added affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. and removed may-affects-5.4 may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 may-affects-8.1 labels Jul 2, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in #8352 Jul 3, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in 6b25787 Jul 3, 2024
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Jul 4, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jul 4, 2024
…ucket (#8344) (#8355)

close #8343, ref #8349

client/controller: record context error and add slowlog about token bucket
- record low process start time, and log it if it's too slow
- record the context error

Signed-off-by: ti-chi-bot <[email protected]>
Signed-off-by: Shuning Chen <[email protected]>

Co-authored-by: ShuNing <[email protected]>
Co-authored-by: Shuning Chen <[email protected]>
@easonn7
Copy link

easonn7 commented Jul 4, 2024

/approve

nolouch added a commit that referenced this issue Jul 4, 2024
…he local bucket (#8352)  (#8365)

* client/controller: record context error and add slowlog about token bucket (#8344) (#8355)

close #8343, ref #8349

client/controller: record context error and add slowlog about token bucket
- record low process start time, and log it if it's too slow
- record the context error

Signed-off-by: Shuning Chen <[email protected]>

* This is an automated cherry-pick of #8352

close #8349

Signed-off-by: nolouch <[email protected]>
Signed-off-by: Shuning Chen <[email protected]>

---------

Signed-off-by: Shuning Chen <[email protected]>
Signed-off-by: nolouch <[email protected]>
Co-authored-by: Ti Chi Robot <[email protected]>
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 4, 2024
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 5, 2024
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 5, 2024
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 5, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jul 8, 2024
close #8349

controller: fix the low_ru request missed 

The problem is that `c.run.currentRequests` is shared by all groups.
If one group triggers a token request that isn't handled by the response, the other group's requests will be discarded.
Here, we do not discard the low_ru triggers.

Signed-off-by: nolouch <[email protected]>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Jul 8, 2024
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Jul 8, 2024
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Jul 8, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jul 8, 2024
close #8349

controller: fix the low_ru request missed 

The problem is that `c.run.currentRequests` is shared by all groups.
If one group triggers a token request that isn't handled by the response, the other group's requests will be discarded.
Here, we do not discard the low_ru triggers.

Signed-off-by: ti-chi-bot <[email protected]>
Signed-off-by: nolouch <[email protected]>

Co-authored-by: ShuNing <[email protected]>
Co-authored-by: nolouch <[email protected]>
ti-chi-bot bot pushed a commit that referenced this issue Jul 8, 2024
close #8349

controller: fix the low_ru request missed 

The problem is that `c.run.currentRequests` is shared by all groups.
If one group triggers a token request that isn't handled by the response, the other group's requests will be discarded.
Here, we do not discard the low_ru triggers.

Signed-off-by: ti-chi-bot <[email protected]>
Signed-off-by: nolouch <[email protected]>

Co-authored-by: ShuNing <[email protected]>
Co-authored-by: nolouch <[email protected]>
ti-chi-bot bot pushed a commit that referenced this issue Jul 8, 2024
close #8349

controller: fix the low_ru request missed 

The problem is that `c.run.currentRequests` is shared by all groups.
If one group triggers a token request that isn't handled by the response, the other group's requests will be discarded.
Here, we do not discard the low_ru triggers.

Signed-off-by: ti-chi-bot <[email protected]>
Signed-off-by: nolouch <[email protected]>

Co-authored-by: ShuNing <[email protected]>
Co-authored-by: nolouch <[email protected]>
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 12, 2024
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 12, 2024
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 14, 2024
@ti-chi-bot ti-chi-bot bot added the report/customer Customers have encountered this bug. label Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.2 report/customer Customers have encountered this bug. severity/major type/bug The issue is confirmed as a bug.
Projects
3 participants