Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60685

TransactionCoordinator may interrupt locally executing update with non-Interruption error category, leading to server crash

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v5.1, v5.0, v4.4, v4.2
    • Sprint:
      Sharding 2021-11-15, Sharding 2021-11-29
    • Story Points:
      2

      Description

      TransactionCoordinator schedules a task which will interrupt the OperationContext with a TransactionCoordinatorReachedAbortDecision error code if the coordinator has yet to reach its decision within transactionLifetimeLimitSeconds. The TransactionCoordinatorReachedAbortDecision error code isn't part of the Interruption error category. This means if the OperationContext for the update to write the participant list is interrupted, ReplClientInfo::setLastOpToSystemLastOpTimeIgnoringInterrupt() propagates the exception and leaves it uncaught in the LastOpFixer::~LastOpFixer() destructor.

      Thread 74 "Transac.dinator" received signal SIGTRAP, Trace/breakpoint trap.
      [Switching to Thread 0x7f1c76785700 (LWP 32167)]
      0x00007f1ca5538817 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
      (gdb) bt
      #0  0x00007f1ca5538817 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
      #1  0x00007f1ca749ea0b in mongo::breakpoint () at src/mongo/util/debugger.cpp:72
      #2  0x00007f1ca7472b33 in mongo::(anonymous namespace)::myTerminate () at src/mongo/util/signal_handlers_synchronous.cpp:226
      #3  0x00007f1ca7566d66 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/combined/libstdc++-v3/libsupc++/eh_terminate.cc:47
      #4  0x00007f1ca7568639 in __cxa_call_terminate (ue_header=ue_header@entry=0x55d314320000) at ../../../../src/combined/libstdc++-v3/libsupc++/eh_call.cc:54
      #5  0x00007f1ca75669f5 in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=6, exception_class=5138137972254386944, ue_header=0x55d314320000, context=<optimized out>) at ../../../../src/combined/libstdc++-v3/libsupc++/eh_personality.cc:676
      #6  0x00007f1ca5755573 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
      #7  0x00007f1ca5755df5 in _Unwind_Resume () from /lib/x86_64-linux-gnu/libgcc_s.so.1
      #8  0x00007f1ca115ec03 in boost::intrusive_ptr<mongo::Status::ErrorInfo const>::~intrusive_ptr (this=<optimized out>, __in_chrg=<optimized out>) at src/third_party/boost/boost/smart_ptr/intrusive_ptr.hpp:96
      #9  mongo::Status::~Status (this=<optimized out>, __in_chrg=<optimized out>) at src/mongo/base/status.h:60
      #10 mongo::StatusWith<mongo::repl::OpTime>::~StatusWith (this=<optimized out>, __in_chrg=<optimized out>) at src/mongo/base/status_with.h:81
      #11 mongo::repl::ReplClientInfo::setLastOpToSystemLastOpTime (this=0x55d31400dc98, opCtx=0x55d314315200) at src/mongo/db/repl/repl_client_info.cpp:72
      #12 0x00007f1ca1160b4f in mongo::repl::ReplClientInfo::setLastOpToSystemLastOpTimeIgnoringInterrupt (this=<optimized out>, opCtx=<optimized out>) at /opt/mongodbtoolchain/revisions/ba5f698948588cb5da922d3cadee990f5f9f48cd/stow/gcc-v3.pPo/include/c++/8.5.0/bits/unique_ptr.h:597
      #13 0x00007f1ca304aca0 in mongo::write_ops_exec::(anonymous namespace)::LastOpFixer::~LastOpFixer (this=0x7f1c7677f2e0, __in_chrg=<optimized out>) at /opt/mongodbtoolchain/revisions/ba5f698948588cb5da922d3cadee990f5f9f48cd/stow/gcc-v3.pPo/include/c++/8.5.0/bits/unique_ptr.h:597
      #14 0x00007f1ca30515fb in mongo::write_ops_exec::performUpdates (opCtx=opCtx@entry=0x55d314315200, wholeOp=..., source=<optimized out>) at src/mongo/db/ops/write_ops_exec.cpp:978
      #15 0x00007f1c9d85cb99 in mongo::(anonymous namespace)::CmdUpdate::Invocation::typedRun (this=0x55d31448ab00, opCtx=0x55d314315200) at src/mongo/db/commands/write_commands.cpp:1412
      #16 0x00007f1c9d85d7be in mongo::TypedCommand<mongo::(anonymous namespace)::CmdUpdate>::InvocationBase::_callTypedRun (opCtx=<optimized out>, this=<optimized out>) at src/mongo/db/commands.h:1256
      #17 mongo::TypedCommand<mongo::(anonymous namespace)::CmdUpdate>::InvocationBase::_runImpl (reply=0x55d3142f87c0, opCtx=<optimized out>, this=<optimized out>) at src/mongo/db/commands.h:1257
      #18 mongo::TypedCommand<mongo::(anonymous namespace)::CmdUpdate>::InvocationBase::run (this=<optimized out>, opCtx=<optimized out>, reply=0x55d3142f87c0) at src/mongo/db/commands.h:1262
      #19 0x00007f1ca07eb22f in mongo::CommandHelpers::runCommandInvocation (opCtx=0x55d314315200, request=..., invocation=0x55d31448ab00, response=0x55d3142f87c0) at src/mongo/db/commands.cpp:199
      #20 0x00007f1ca07ef09e in mongo::CommandHelpers::<lambda()>::operator() (__closure=0x7f1c76780d50) at src/mongo/db/commands.cpp:183
      #21 mongo::makeReadyFutureWith<mongo::CommandHelpers::runCommandInvocation(std::shared_ptr<mongo::RequestExecutionContext>, std::shared_ptr<mongo::CommandInvocation>, mongo::transport::ServiceExecutor::ThreadingModel)::<lambda()> > (func=...) at src/mongo/util/future.h:1222
      #22 mongo::CommandHelpers::runCommandInvocation (rec=std::shared_ptr<mongo::RequestExecutionContext> (use count 11, weak count 0) = {...}, invocation=std::shared_ptr<mongo::CommandInvocation> (use count 3, weak count 0) = {...}, threadingModel=<optimized out>) at src/mongo/db/commands.cpp:184
      ...
      #88 0x00007f1ca201d2b4 in mongo::DBDirectClient::call (this=<optimized out>, toSend=..., response=..., assertOk=<optimized out>, actualServer=<optimized out>) at src/mongo/db/dbdirectclient.cpp:141
      #89 0x00007f1ca04d2668 in mongo::DBClientBase::runCommandWithTarget (this=0x7f1c767832c0, request=...) at src/mongo/client/dbclient_base.cpp:240
      #90 0x00007f1ca47f5fa3 in mongo::DBClientBase::runCommand (this=<optimized out>, request=...) at /opt/mongodbtoolchain/revisions/ba5f698948588cb5da922d3cadee990f5f9f48cd/stow/gcc-v3.pPo/include/c++/8.5.0/bits/move.h:182
      #91 0x00007f1ca435984e in mongo::txn::(anonymous namespace)::persistParticipantListBlocking (opCtx=0x55d314315200, lsid=..., txnNumberAndRetryCounter=..., participantList=std::vector of length 2, capacity 2 = {...}) at src/third_party/boost/boost/smart_ptr/intrusive_ptr.hpp:96
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              luis.osta Luis Osta
              Reporter:
              max.hirschhorn Max Hirschhorn
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: