Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60335

Wait for all user operations to be killed before executing WT Rollback To Stable

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Storage Execution
    • Execution Team 2021-11-15, Execution Team 2021-11-29, Execution Team 2021-12-13
    • 6

      SERVER-58311 surfaced a race condition whereby we call into WiredTiger Rollback To Stable (RTS) while some operations that were intended to be killed are found to be still running. This is not expected as WiredTiger usually requires RTS to have exclusive access to a dhandle; a running user operation can't guarantee that. This is observed when building the server with these non-default flags.

      E  STORAGE  22435   [BackgroundSync] "WiredTiger error","attr":{"error":16,"message":"[1625743985:510865][6205:0x7f6a41a84700], file:collection-XXX.wt, txn rollback_to_stable: __rollback_to_stable_btree_apply, 1522: file:sizeStorer.wt: unable to open handle, error indicates handle is unavailable due to concurrent use: Device or resource busy"}
      

      The problematic operation surviving the killOp seems to be a yielded oplog getmore that waits for more oplog inserts via CappedInsertNotifier. Stack at the time of failure:

      #0  0x00007f3adfe33f85 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
      #1  0x00007f3ae16e6705 in __gthread_cond_timedwait (__cond=0x7f3aad53f7b0, __mutex=0x7f3ab54de0b0, __abs_timeout=0x7f3a8da22ce0) at /opt/mongodbtoolchain/revisions/d4e86835db48288718d967ff002f9945c7243aac/stow/gcc-v3.oGe/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../in
      clude/c++/8.2.0/x86_64-mongodb-linux/bits/gthr-default.h:871
      #2  0x00007f3ae17125e6 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f3aad53f7b0, __lock=..., __atime=...) at /opt/mongodbtoolchain/revisions/d4e86835db48288718d967ff002f9945c7243aac/stow/gcc-v3.oGe/li
      b/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/condition_variable:178
      #3  0x00007f3ae1712535 in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f3aad53f7b0, __lock=..., __atime=...) at /opt/mongodbtoolchain/revisions/d4e86835db48288718d967ff002f9945c7243aac/stow/gcc-v3.oGe/lib/gcc/x
      86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/condition_variable:106
      #4  0x00007f3ade9d55fc in std::_V2::condition_variable_any::wait_until<std::unique_lock<mongo::latch_detail::Latch>, std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f3aad53f7b0, __lock=..., __atime=...) at /opt/mongod
      btoolchain/revisions/d4e86835db48288718d967ff002f9945c7243aac/stow/gcc-v3.oGe/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/condition_variable:286
      #5  0x00007f3acac8a2bf in mongo::CappedInsertNotifier::waitUntil (this=0x7f3aad53f7b0, prevVersion=18, deadline=...) at src/mongo/db/catalog/collection.cpp:53
      #6  0x00007f3ad436699a in mongo::insert_listener::waitForInserts(mongo::OperationContext*, mongo::PlanYieldPolicy*, mongo::insert_listener::CappedInsertNotifierData*)::$_1::operator()() const (this=0x7f3a8da230c0) at src/mongo/db/query/plan_insert_listener.cpp:108
      #7  0x00007f3ad43667fd in std::_Function_handler<void (), mongo::insert_listener::waitForInserts(mongo::OperationContext*, mongo::PlanYieldPolicy*, mongo::insert_listener::CappedInsertNotifierData*)::$_1>::_M_invoke(std::_Any_data const&) (__functor=...) at /opt/mongodbtoolchain/revisions/d4e86835db48288718d967ff002f9945c7243aac/stow/gcc-v3.oGe/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/std_function.h:297
      #8  0x00007f3adf2c491e in std::function<void ()>::operator()() const (this=0x7f3a8da230c0) at /opt/mongodbtoolchain/revisions/d4e86835db48288718d967ff002f9945c7243aac/stow/gcc-v3.oGe/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/std_function.h:687
      #9  0x00007f3ad084066e in mongo::PlanYieldPolicy::performYield(mongo::OperationContext*, mongo::Yieldable const*, std::function<void ()>) (this=0x7f3a9ee0e380, opCtx=0x7f3aacbbc220, yieldable=0x7f3a8da24a10, whileYieldingFn=...) at src/mongo/db/query/plan_yield_policy.cpp:180
      #10 0x00007f3ad0840134 in mongo::PlanYieldPolicy::yieldOrInterrupt(mongo::OperationContext*, std::function<void ()>) (this=0x7f3a9ee0e380, opCtx=0x7f3aacbbc220, whileYieldingFn=...) at src/mongo/db/query/plan_yield_policy.cpp:108
      #11 0x00007f3ad4366450 in mongo::insert_listener::waitForInserts (opCtx=0x7f3aacbbc220, yieldPolicy=0x7f3a9ee0e380, notifierData=0x7f3a8da234b0) at src/mongo/db/query/plan_insert_listener.cpp:106
      #12 0x00007f3ad433d3b8 in mongo::PlanExecutorImpl::_getNextImpl (this=0x7f3a94d37020, objOut=0x7f3a8da23578, dlOut=0x0) at src/mongo/db/query/plan_executor_impl.cpp:451
      #13 0x00007f3ad433c598 in mongo::PlanExecutorImpl::getNextDocument (this=0x7f3a94d37020, objOut=0x7f3a94d370f0, dlOut=0x0) at src/mongo/db/query/plan_executor_impl.cpp:311
      #14 0x00007f3ad433c37a in mongo::PlanExecutorImpl::getNext (this=0x7f3a94d37020, objOut=0x7f3a8da23900, dlOut=0x0) at src/mongo/db/query/plan_executor_impl.cpp:298
      #15 0x00007f3abf2e9b85 in mongo::(anonymous namespace)::GetMoreCmd::Invocation::generateBatch (this=0x7f3aaee32480, opCtx=0x7f3aacbbc220, cursor=0x7f3a94d2d620, cmd=..., isTailable=true, nextBatch=0x7f3a8da243d0, numResults=0x7f3a8da23ff0, docUnitsReturned=0x7f3a8da23fd8) at src/mongo/db/commands/getmore_cmd.cpp:335
      #16 0x00007f3abf2e7cf2 in mongo::(anonymous namespace)::GetMoreCmd::Invocation::acquireLocksAndIterateCursor (this=0x7f3aaee32480, opCtx=0x7f3aacbbc220, reply=0x7f3ab5446ca0, cursorManager=0x7f3ab54053a0, cursorPin=..., curOp=0x7f3a94d1a7b8) at src/mongo/db/commands/getmore_cmd.cpp:641
      #17 0x00007f3abf2e5f2f in mongo::(anonymous namespace)::GetMoreCmd::Invocation::run (this=0x7f3aaee32480, opCtx=0x7f3aacbbc220, reply=0x7f3ab5446ca0) at src/mongo/db/commands/getmore_cmd.cpp:745
      

      Leads of investigation:

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            josef.ahmad@mongodb.com Josef Ahmad
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: