Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer crashes #5509

Open
CMDMichalKoval opened this issue Feb 7, 2024 · 2 comments
Open

Consumer crashes #5509

CMDMichalKoval opened this issue Feb 7, 2024 · 2 comments

Comments

@CMDMichalKoval
Copy link

CMDMichalKoval commented Feb 7, 2024

Environment

What version are you running?
24.1.1

Steps to Reproduce

  • run consumer with large topic

Expected Result

Not crash.

Actual Result

Consumer snuba-consumer crashes with error

2024-02-07 23:48:37,206 librdkafka log level: 6                                                                                                                                                                  
2024-02-07 23:48:38,289 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 10090059}                                                                                                      
2024-02-07 23:50:04,176 Caught exception, shutting down...                                                                                                                                                       
Traceback (most recent call last):                                                                                                                                                                               
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 319, in run                                                                                                                
    self._run_once()                                                                                                                                                                                             
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 393, in _run_once                                                                                                          
    self.__processing_strategy.poll()                                                                                                                                                                            
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll                                                                                                        
    self.__inner_strategy.poll()                                                                                                                                                                                 
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll                                                                                                      
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll                                                                                                         
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/reduce.py", line 168, in poll                                                                                                       
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 107, in poll                                                                                          
    result = future.result()                                                                                                                                                                                     
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result                                                                                                                              
    return self.__get_result()                                                                                                                                                                                   
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result                                                                                                                        
    raise self._exception                                                                                                                                                                                        
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run                                                                                                                                 
    result = self.fn(*self.args, **self.kwargs)                                                                                                                                                                  
  File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 137, in flush_batch                                                                                                                            
    message.payload.close()                                                                                                                                                                                      
  File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close                                                                                                                                          
    self.__insert_batch_writer.close()                                                                                                                                                                           
  File "/usr/src/snuba/snuba/consumers/consumer.py", line 166, in close                                                                                                                                          
    self.__writer.write(                                                                                                                                                                                         
  File "/usr/src/snuba/snuba/clickhouse/http.py", line 359, in write                                                                                                                                             
    batch.join(timeout=batch_join_timeout)                                                                                                                                                                       
  File "/usr/src/snuba/snuba/clickhouse/http.py", line 282, in join                                                                                                                                              
    raise ClickhouseWriterError(message, code=code, row=row)                                                                                                                                                     
snuba.clickhouse.errors.ClickhouseWriterError: Too large value for FixedString(32): (while reading the value of key primary_hash): (at row 1)                                                                    
2024-02-07 23:50:04,186 Closing <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7f622bfdeb30>...                                                                                                       
2024-02-07 23:50:04,186 Partitions to revoke: [Partition(topic=Topic(name='events'), index=0)]                                    

snuba-metrics-consumer:

2024-02-07 23:59:14,276 librdkafka log level: 6                                                                                                                                                                  
2024-02-07 23:59:14,304 New partitions assigned: {Partition(topic=Topic(name='snuba-metrics'), index=0): 0}                                                                                                      
2024-02-07 23:59:17,326 Caught exception, shutting down...                                                                                                                                                       
Traceback (most recent call last):                                                                                                                                                                               
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 319, in run                                                                                                                
    self._run_once()                                                                                                                                                                                             
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/processor.py", line 393, in _run_once                                                                                                          
    self.__processing_strategy.poll()                                                                                                                                                                            
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll                                                                                                        
    self.__inner_strategy.poll()                                                                                                                                                                                 
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll                                                                                                      
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll                                                                                                         
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/reduce.py", line 168, in poll                                                                                                       
    self.__next_step.poll()                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 107, in poll                                                                                          
    result = future.result()                                                                                                                                                                                     
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result                                                                                                                              
    return self.__get_result()                                                                                                                                                                                   
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result                                                                                                                        
    raise self._exception                                                                                                                                                                                        
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run                                                                                                                                 
    result = self.fn(*self.args, **self.kwargs)                                                                                                                                                                  
  File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 137, in flush_batch                                                                                                                            
    message.payload.close()                                                                                                                                                                                      
  File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close                                                                                                                                          
    self.__insert_batch_writer.close()                                                                                                                                                                           
  File "/usr/src/snuba/snuba/consumers/consumer.py", line 166, in close                                                                                                                                          
    self.__writer.write(                                                                                                                                                                                         
  File "/usr/src/snuba/snuba/clickhouse/http.py", line 359, in write                                                                                                                                             
    batch.join(timeout=batch_join_timeout)                                                                                                                                                                       
  File "/usr/src/snuba/snuba/clickhouse/http.py", line 282, in join                                                                                                                                              
    raise ClickhouseWriterError(message, code=code, row=row)                                                                                                                                                     
snuba.clickhouse.errors.ClickhouseWriterError: Method write is not supported by storage Distributed with more than one shard and no sharding key provided (version 21.8.13.6 (official build))                   
2024-02-07 23:59:17,335 Closing <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7fd177741d50>...  
@untitaker
Copy link
Member

2024-02-07 23:48:38,289 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 10090059}

this tells you which message is bad. can you dump the message on that offset? it should be possible to do so with kafkactl/kcat

@untitaker
Copy link
Member

as a hotfix you can also delete the consumer group using kafkactl and run with --auto-offset-reset latest, this will basically "flush the queue"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants