Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22011

Direct writes to the local database can cause deadlock involving the WiredTiger write throttle

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.0.8, 3.2.0
    • Fix Version/s: 3.2.3, 3.3.2
    • Component/s: Storage
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Sprint:
      Integration F (02/01/16), Integration 10 (02/22/16)

      Description

      In MongoDB 3.0 we introduced the WiredTiger write throttle parameter, which limits the number of concurrent threads running inside the WT storage engine. The write throttle works by acquiring a 'ticket' just before entering in the WT storage engine and releasing the ticket when the WT transaction is done.

      The write throttle tickets are in most cases acquired before the 'local' database is locked for replication writes. However, if an operation writes to the local database directly, the order of ticket acquisition reverses and the local database lock is acquired first.

      Because of this reversal of the locking order, if there is sufficiently high write load and the number of available tickets is approaching zero, it is possible for a deadlock to occur between a thread which directly writes to the local database and a thread, which just acquired the last ticket. The default value for the WT write throttle is 128, so this would happen when the number of concurrent threads exceeds this value.

      Once the server enters into this state, there is no way to get out of it without restarting it.

      A preventative workaround (in the sense that it should be done before the issue occurs) is to increase the write throttle to a value, which is higher than the maximum projected write load. For example:

      db.adminCommand({ setParameter: 1, wiredTigerConcurrentWriteTransactions: 4096 });
      

      For more detail, here is a sequence of events which resulted in deadlocked threads:

      Thread1 (already holds a ticket and waits on the local DB lock):

      Thread 418 (Thread 0x7efc8ba6e700 (LWP 23482)):
      #0  0x000000327fe0b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00000000009cbe38 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
      #2  0x00000000009d3133 in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
      #3  0x00000000009c55c7 in mongo::Lock::DBLock::DBLock(mongo::Locker*, mongo::StringData const&, mongo::LockMode) ()
      #4  0x0000000000c9dc9b in ?? ()
      #5  0x0000000000c9c477 in ?? ()
      #6  0x0000000000c9c737 in mongo::repl::logOp(mongo::OperationContext*, char const*, char const*, mongo::BSONObj const&, mongo::BSONObj*, bool*, bool) ()
      #7  0x00000000009bcd5f in mongo::WriteBatchExecutor::execOneInsert(mongo::WriteBatchExecutor::ExecInsertsState*, mongo::WriteErrorDetail**) ()
      #8  0x00000000009bd9dd in mongo::WriteBatchExecutor::execInserts(mongo::BatchedCommandRequest const&, mongo::WriteConcernOptions const&, std::vector<mongo::WriteErrorDetail*, std::allocator<mongo::WriteErrorDetail*> >*) ()
      #9  0x00000000009bf959 in mongo::WriteBatchExecutor::bulkExecute(mongo::BatchedCommandRequest const&, mongo::WriteConcernOptions const&, std::vector<mongo::BatchedUpsertDetail*, std::allocator<mongo::BatchedUpsertDetail*> >*, std::vector<mongo::WriteErr
      orDetail*, std::allocator<mongo::WriteErrorDetail*> >*) ()
      #10 0x00000000009c0185 in mongo::WriteBatchExecutor::executeBatch(mongo::BatchedCommandRequest const&, mongo::BatchedCommandResponse*) ()
      #11 0x00000000009c2e6c in mongo::WriteCmd::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo:
      :BSONObjBuilder&, bool) ()
      #12 0x00000000009e5e81 in mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<
      char> >&, mongo::BSONObjBuilder&, bool) ()
      #13 0x00000000009e6e9c in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) ()
      #14 0x00000000009e7b6b in mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) ()
      #15 0x0000000000bfebcb in mongo::runQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::NamespaceString const&, mongo::CurOp&, mongo::Message&) ()
      #16 0x0000000000ad0752 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) ()
      #17 0x0000000000809935 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
      #18 0x0000000000f93a19 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
      #19 0x000000327fe07851 in start_thread () from /lib64/libpthread.so.0
      

      Thread2 (holds the local DB lock and waits on a ticket):

      Thread 417 (Thread 0x7efc8b96d700 (LWP 23525)):
      #0  0x000000327fe0d720 in sem_wait () from /lib64/libpthread.so.0
      #1  0x0000000000f75b24 in mongo::TicketHolder::waitForTicket() ()
      #2  0x0000000000def990 in mongo::WiredTigerRecoveryUnit::_getTicket(mongo::OperationContext*) ()
      #3  0x00000000009bcbff in mongo::WriteBatchExecutor::execOneInsert(mongo::WriteBatchExecutor::ExecInsertsState*, mongo::WriteErrorDetail**) ()
      #4  0x00000000009bd9dd in mongo::WriteBatchExecutor::execInserts(mongo::BatchedCommandRequest const&, mongo::WriteConcernOptions const&, std::vector<mongo::WriteErrorDetail*, std::allocator<mongo::WriteErrorDetail*> >*) ()
      #5  0x00000000009bf959 in mongo::WriteBatchExecutor::bulkExecute(mongo::BatchedCommandRequest const&, mongo::WriteConcernOptions const&, std::vector<mongo::BatchedUpsertDetail*, std::allocator<mongo::BatchedUpsertDetail*> >*, std::vector<mongo::WriteErr
      orDetail*, std::allocator<mongo::WriteErrorDetail*> >*) ()
      #6  0x00000000009c0185 in mongo::WriteBatchExecutor::executeBatch(mongo::BatchedCommandRequest const&, mongo::BatchedCommandResponse*) ()
      #7  0x00000000009c2e6c in mongo::WriteCmd::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo:
      :BSONObjBuilder&, bool) ()
      #8  0x00000000009e5e81 in mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<
      char> >&, mongo::BSONObjBuilder&, bool) ()
      #9  0x00000000009e6e9c in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) ()
      #10 0x00000000009e7b6b in mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) ()
      #11 0x0000000000bfebcb in mongo::runQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::NamespaceString const&, mongo::CurOp&, mongo::Message&) ()
      #12 0x0000000000ad0752 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) ()
      #13 0x0000000000809935 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
      #14 0x0000000000f93a19 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
      #15 0x000000327fe07851 in start_thread () from /lib64/libpthread.so.0
      

      Thread3 (blocked behind Thread1, because it acquires the database lock in S mode):

      Thread 1383 (Thread 0x7f0d432ed700 (LWP 54124)):
      #0  0x000000327fe0b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00000000009cbe38 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
      #2  0x00000000009d3133 in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
      #3  0x00000000009c55c7 in mongo::Lock::DBLock::DBLock(mongo::Locker*, mongo::StringData const&, mongo::LockMode) ()
      #4  0x0000000000935681 in mongo::AutoGetDb::AutoGetDb(mongo::OperationContext*, mongo::StringData const&, mongo::LockMode) ()
      #5  0x00000000009ea511 in mongo::DBStats::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
      #6  0x00000000009e5e81 in mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
      #7  0x00000000009e6e9c in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) ()
      #8  0x00000000009e7b6b in mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) ()
      #9  0x0000000000bfebcb in mongo::runQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::NamespaceString const&, mongo::CurOp&, mongo::Message&) ()
      #10 0x0000000000ad0752 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) ()
      #11 0x0000000000809935 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
      #12 0x0000000000f93a19 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
      #13 0x000000327fe07851 in start_thread () from /lib64/libpthread.so.0
      

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                20 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: