You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scala: 2.12
Akka version: 2.5.31
Alpakka-kafka version: 2.0.7
Consumers are deployed with Kubernetes
ConnectionChecker is activated
Bug observed on Prod
Expected Behavior
Consumer continue consumming or dies.
Actual Behavior
Consumer remains in live with CURRENT-OFFSET = 386844 and stops consumming
Consumer continue sending heartbeats, and comsumer metrics shows no LAG.
Relevant logs
2021-11-08 09:24:09,324 TRACE o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-user-event-ingestion-2, groupId=user-event-ingestion] Returning fetched records at offset FetchPosition{offset=386844, offsetEpoch=Optional[30]...
2021-11-08 09:24:09,324 DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-user-event-ingestion-2, groupId=user-event-ingestion] Added READ_UNCOMMITTED fetch request for partition user-interaction-0 at position FetchPosition{offset=387110, offsetEpoch=Optional[30],...
2021-11-08 09:24:09,324 DEBUG o.a.k.clients.FetchSessionHandler - [Consumer clientId=consumer-user-event-ingestion-2, groupId=user-event-ingestion] Built incremental fetch (sessionId=1079420081, epoch=1) for node 3. Added (), altered (user-interaction-0), removed () out of (user-interaction-0)
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
user-event-ingestion user-interaction 0 386344 387574 1230
--
Same Consumer code is used by other services but the problem was not observed.
The event traitement for this service is longer then the others so I am supposing may be the consumer in unable to resume the partition after a long time.
Also we observed an event loss after this stop.
I added application logs for this consumer and supervision strategy to log exception but nothing pass through.
Any hints please ?
Reproducible Test Case
Still unable to found the scenario.
The text was updated successfully, but these errors were encountered:
khouloudsa
changed the title
Kakfa consumer stops without errors
Kafka consumer stops without errors
Nov 9, 2021
Versions used
Scala: 2.12
Akka version: 2.5.31
Alpakka-kafka version: 2.0.7
Consumers are deployed with Kubernetes
ConnectionChecker is activated
Bug observed on Prod
Expected Behavior
Consumer continue consumming or dies.
Actual Behavior
Consumer remains in live with CURRENT-OFFSET = 386844 and stops consumming
Consumer continue sending heartbeats, and comsumer metrics shows no LAG.
Relevant logs
2021-11-08 09:24:09,324 TRACE o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-user-event-ingestion-2, groupId=user-event-ingestion] Returning fetched records at offset FetchPosition{offset=386844, offsetEpoch=Optional[30]...
2021-11-08 09:24:09,324 DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-user-event-ingestion-2, groupId=user-event-ingestion] Added READ_UNCOMMITTED fetch request for partition user-interaction-0 at position FetchPosition{offset=387110, offsetEpoch=Optional[30],...
2021-11-08 09:24:09,324 DEBUG o.a.k.clients.FetchSessionHandler - [Consumer clientId=consumer-user-event-ingestion-2, groupId=user-event-ingestion] Built incremental fetch (sessionId=1079420081, epoch=1) for node 3. Added (), altered (user-interaction-0), removed () out of (user-interaction-0)
2021-11-08 09:24:09,324 DEBUG o.a.k.c.consumer.internals.Fetcher - [Consumer clientId=consumer-user-event-ingestion-2, groupId=user-event-ingestion] Sending READ_UNCOMMITTED IncrementalFetchRequest(toSend=(user-interaction-0), toForget=(), implied=()) to broker ...
2021-11-08 09:24:09,374 DEBUG o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-user-event-ingestion-2, groupId=user-event-ingestion] Pausing partitions [user-interaction-0]
--
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
user-event-ingestion user-interaction 0 386344 387574 1230
--
Same Consumer code is used by other services but the problem was not observed.
The event traitement for this service is longer then the others so I am supposing may be the consumer in unable to resume the partition after a long time.
Also we observed an event loss after this stop.
I added application logs for this consumer and supervision strategy to log exception but nothing pass through.
Any hints please ?
Reproducible Test Case
Still unable to found the scenario.
The text was updated successfully, but these errors were encountered: