Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer Group: "kafka server: Request exceeded the user-specified time limit in the request." #3028

Open
meetgray opened this issue Dec 10, 2024 · 4 comments

Comments

@meetgray
Copy link

meetgray commented Dec 10, 2024

Description

I am having a huge number of messages being published into my topic. I have many consumers in a consumer group consuming them. The consumers are optimized to process messages fast enough based on the consumer configs. But still I some how get the below errors in few consumers:

[sarama] 2024/12/10 07:45:04 kafka: error while consuming my-topic/12: kafka server: Request exceeded the user-specified time limit in the request

I also noticed that when I receive this errors, the CPU usage of my consumer drops and it almost sits idle around this time despite of having large number of messages in the partitions waiting to be consumed. Because of this consumer lag keeps increases for affected partitions.

I also took a CPU profile of consumer when I receive this errors and it also proves that the consumer's CPU is idling and messages are not being processed.

Screenshot 2024-12-10 at 1 57 01 PM

Above is a 60s CPU profile, out of which CPU was used only for 4s. That indicates idle CPU.

I also took goroutine blocking profile and it seems that even ConsumeClaim() is waiting for the messages and hence my message consumer goroutines are waiting on a channel to receive any message to process.

image

Below is Network Blocking Profile (it's an SVG, click to open full preview)

nw-blocking-profile

Below Synchronization blocking profile:

block

Below is Syscall profile:

syscall

In all above profiles roughly indicates that sarama is waiting for messages from broker.

I looked at below similar issues:

#1758
#1562

But none of them are having any satisfactory solution and it seems that the root cause is also not identified.

Versions
Sarama Kafka Go
v1.43.3 3.8.0 1.22.8
Configuration
	config.Consumer.Offsets.AutoCommit.Interval = 10 * time.Second
	config.Consumer.Fetch.Default = 1048576 * 2              // 2 MB for faster consuming (Default 1MB)
	config.Consumer.MaxProcessingTime = 10 * time.Second     // increasing max processing time per message, to prevent frequent partition rebalances
	config.Consumer.Group.Session.Timeout = 60 * time.Second // to prevent unnecessary partition rebalances
	config.Consumer.Group.Heartbeat.Interval = 12 * time.Second
Logs
logs: CLICK ME

[sarama] 2024/12/10 07:45:04 kafka: error while consuming my-topic/7: kafka server: Request exceeded the user-specified time limit in the request
[sarama] 2024/12/10 07:45:04 kafka: error while consuming  my-topic/11: kafka server: Request exceeded the user-specified time limit in the request
[sarama] 2024/12/10 07:45:04 kafka: error while consuming my-topic/12: kafka server: Request exceeded the user-specified time limit in the request

Additional Context
@meetgray
Copy link
Author

@dnwe since you are an active maintainer, just pinging you here. Thanks!

@meetgray meetgray changed the title Getting errors from sarama consumer: "kafka server: Request exceeded the user-specified time limit in the request." kafka server: Request exceeded the user-specified time limit in the request. Dec 10, 2024
@meetgray meetgray changed the title kafka server: Request exceeded the user-specified time limit in the request. Consumer Group: "kafka server: Request exceeded the user-specified time limit in the request." Dec 10, 2024
@meetgray
Copy link
Author

@puellanivis

@puellanivis
Copy link
Contributor

I don’t actually have any idea what’s going on here. I saw this the day you posted it, but because I didn’t have any idea what’s going on here, I didn’t reply with anything.

@dnwe
Copy link
Collaborator

dnwe commented Dec 23, 2024

@meetgray from the description of your issue and particularly the "I also noticed that when I receive this errors, the CPU usage of my consumer drops and it almost sits idle around this time" I wonder if either your Consume loop or your ConsumeClaim func aren't correctly handling errors and context cancellation. Can you post the code snippets from your application here and/or compare them to the consumergroup example from here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants