Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MB-48925 1/3: Don't extend VBucket lifetime via bg Tasks
During bucket shutdown we intermittently see an exception thrown during task scheduling on a background NonIO thread, which crashes the memcached process. +Analysis+ Bug is as follows. Starting at the main thread which is deleting the Bucket (Thread 1): (gdb) bt ... #10 0x0000000000649bf3 in FollyExecutorPool::schedule(std::shared_ptr<GlobalTask>) () at /c++/10.2.0/new:175 #11 0x000000000084271b in EPVBucket::scheduleDeferredDeletion(EventuallyPersistentEngine&) () at /c++/10.2.0/ext/atomicity.h:100 #12 0x00000000006dfe7a in VBucket::DeferredDeleter::operator()(VBucket*) const () at kv_engine/engines/ep/src/vbucket.cc:3990 #13 0x000000000086f874 in std::_Sp_counted_deleter<EPVBucket*, VBucket::DeferredDeleter, ...>::_M_dispose () at /c++/10.2.0/bits/shared_ptr_base.h:453 ... #18 std::shared_ptr<VBucket>::~shared_ptr (this=0x7b44000515d0, __in_chrg=<optimized out>) at /c++/10.2.0/bits/shared_ptr.h:121 #19 PagingVisitor::~PagingVisitor (this=0x7b4400051540, __in_chrg=<optimized out>) at kv_engine/engines/ep/src/paging_visitor.h:39 ... #31 std::__shared_ptr<GlobalTask, (__gnu_cxx::_Lock_policy)2>::reset () at /c++/10.2.0/bits/shared_ptr_base.h:1301 #32 EventuallyPersistentEngine::waitForTasks(std::vector<std::shared_ptr<GlobalTask>, std::allocator<std::shared_ptr<GlobalTask> > >&) () at kv_engine/engines/ep/src/ep_engine.cc:6752 #33 0x000000000082396f in EventuallyPersistentEngine::destroyInner(bool) () at kv_engine/engines/ep/src/ep_engine.cc:2135 1. PagingVisitor is still in existence running after `EventuallyPersistentEngine::destroyInner` - see frame #19. This is because all tasks belonging to bucket were returned from unregisterTaskable() just before. 2. PagingVisitor (via VBCBAdaptor) is destroyed, it decrements the refcount on the shared_ptr<VBucket> it owns - see frame #18. 3. That is the last reference to the VBucket, which results in VBucket::DeferredDeleter being invoked which in turn schedules a task to delete the VBucket (disk and memory) in the background - see frame #11. We see the schedule's lambda happen on the SchedulerPool0 thread (T:35): Thread 35 "SchedulerPool0" hit Catchpoint 1 (exception thrown), __cxxabiv1::__cxa_throw (..., tinfo=0x10c4ec8 <typeinfo for std::out_of_range@@GLIBCXX_3.4>, ...) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:80 (gdb) bt #1 0x00007ffff4cad7d2 in std::__throw_out_of_range (__s=__s@entry=0xcc68e6 "_Map_base::at") at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/functexcept.cc:82 ... #3 0x00000000005504ee in std::unordered_map<...>::at (__k=@0x7fffe83a8f88: 0x7b7400000848, this=0x7b1000005580) at /c++/10.2.0/bits/unordered_map.h:1000 #4 FollyExecutorPool::State::scheduleTask (this=..., executor=..., pool=..., task=...) at kv_engine/executor/folly_executorpool.cc:415 ... #8 folly::EventBase::runInEventBaseThreadAndWait(...) at folly/io/async/EventBase.cpp:671 ... In FollyExecutorPool::State::scheduleTask (frame #3) we attempt to lookup the Taskable (Bucket) in the ExecutorPool's map, however given its already been unregistered, the taskable is not found an the std::out_of_range exception is thrown. This is a lifetime issue. We have VBucket objects potentially being kept alive longer than their expected lifetime by virtue of background tasks having shared ownership of them - and those background tasks outlive the lifetime of their parent object (KVBucket), and crucially past when the owning Bucket is unregistered with the ExecutorPool and can no longer schedule tasks. When it then _does+ attempt to schedule a task against an unregistered (and deleted) Taskable; we see the crash. +Solution+ There's arguably two problems which should be addressed (although technically only one of the two is required to encounter this crash): 1. Background tasks owning VBuckets when they are not executing. 2. Background tasks outliving their associated Taskable (aka Bucket). This patch addresses the critical issue of (1) - we remove the (shared) ownership of VBucket from the background tasks which previoulsy had it - both PagingVisitor which is the problematic class in this scenario, but also in the other background Tasks which potentially have the same problem. The 2nd patch will tighten up the API for visiting VBuckets, so visitors are not passed a VBucketPtr, but instead VBucket& which reduces the chance of similar problems happening in future. The 3rd patch will adddress Background Taks outliving their Taskable. Change-Id: I340a3e4dc3d9234c4a34866b410fb8295a1c98d1 Reviewed-on: http://review.couchbase.org/c/kv_engine/+/163783 Tested-by: Dave Rigby <[email protected]> Reviewed-by: Richard de Mellow <[email protected]> Reviewed-by: James H <[email protected]>
- Loading branch information