Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
dcpdrain: Increase default NOOP interval from 1 to 60s
dcpdrain currently sets the DCP noop-interval to 1s, so the producer will send NOOP requests to dcpdrain every 1s and dcpdrain needs to correctly handle this request and send a response. When connecting to clusters with high latency between client and server nodes, it can take more than 1 second to complete setting up the DCP connection and endering the main event loop. This means the server node may start to send DCP noop requests before the DCP connection is setup - and crucially dcpdrain's event loop is ready to process the DCP noop request. This results in dcpdrain crashing as it gets a DCP noop request when it is expecting a control response: Process 43094 launched: '/Users/dave/repos/couchbase/server/source/build/kv_engine/dcpdrain' (arm64) Using DCP flow control with buffer size: 13421772 Set DCP control message: set_priority=high Set DCP control message: supports_cursor_dropping_vulcan=true Set DCP control message: supports_hifi_MFU=true Set DCP control message: send_stream_end_on_client_close_stream=true Set DCP control message: enable_expiry_opcode=true Set DCP control message: set_noop_interval=1 Set DCP control message: enable_noop=true Set DCP control message: enable_out_of_order_snapshots=true 2023-05-03T12:11:28.705431+01:00 CRITICAL *** Fatal error encountered during exception handling *** 2023-05-03T12:11:28.708626+01:00 CRITICAL Caught unhandled std::exception-derived exception. what(): Header::getResponse(): Header is not a response Target 0: (dcpdrain) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x00000001c24a2d98 libsystem_kernel.dylib` __pthread_kill + 8 frame #1: 0x00000001c24d7ee0 libsystem_pthread.dylib` pthread_kill + 288 frame #2: 0x00000001c2412340 libsystem_c.dylib` abort + 168 frame #3: 0x00000001c2492b08 libc++abi.dylib` abort_message + 132 frame #4: 0x00000001c2482938 libc++abi.dylib` demangling_terminate_handler() + 312 frame #5: 0x00000001c2378330 libobjc.A.dylib` _objc_terminate() + 160 frame #6: 0x000000010008ef30 dcpdrain` backtrace_terminate_handler() + 752 at terminate_handler.cc:88 frame #7: 0x00000001c2491ea4 libc++abi.dylib` std::__terminate(void (*)()) + 20 frame #8: 0x00000001c2494c1c libc++abi.dylib` __cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 36 frame #9: 0x00000001c2494bc8 libc++abi.dylib` __cxa_throw + 140 frame #10: 0x000000010002a77c dcpdrain` BinprotResponse::getTracingData() const [inlined] cb::mcbp::Header::getResponse(this=0x00006000002044a0) const + 48 at header.h:134 frame #11: 0x000000010002a74c dcpdrain` BinprotResponse::getTracingData() const [inlined] BinprotResponse::getResponse(this=<unavailable>) const at client_mcbp_commands.cc:487 frame #12: 0x000000010002a74c dcpdrain` BinprotResponse::getTracingData(this=0x000000016fdfef90) const + 188 at client_mcbp_commands.cc:373 frame #13: 0x000000010002a638 dcpdrain` MemcachedConnection::recvResponse(this=0x0000000101604080, response=0x000000016fdfef90, opcode=<unavailable>, readTimeout=<unavailable>) + 84 at client_connection.cc:1043 ... frame #21: 0x0000000100038f40 dcpdrain` MemcachedConnection::backoff_execute(..., context="DCP_CONTROL", ...) + 100 at client_connection.cc:2016 frame #22: 0x000000010002bab4 dcpdrain` MemcachedConnection::execute(this=0x0000000101604080, command=0x000000016fdfefb0, readTimeout=(__rep_ = 0)) + 168 at client_connection.cc:1998 frame #23: 0x000000010000d688 dcpdrain` main + 280 at dcpdrain.cc:451 frame #24: 0x000000010000d570 dcpdrain` main(argc=<unavailable>, argv=<unavailable>) + 8488 at dcpdrain.cc:929 frame #25: 0x00000001005d508c dyld` start + 520 Ideally dcpdrain should be robust to receiving dcp NOOP messages while setting up the control flags, but that's not simple as we use common code in MemcachedConnection which performs a request and expects a response (of type DCP_CONTROL) in-order. To workaround this problem simply increase the default DCP noop interval from 1 to 60 seconds - 60s /should/ be sufficient to complete the handshake... Change-Id: I0f846956d6499ea54d74f781cb14d7982387c9f4 Reviewed-on: https://review.couchbase.org/c/kv_engine/+/190418 Tested-by: Build Bot <[email protected]> Reviewed-by: Trond Norbye <[email protected]>
- Loading branch information