Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16065

Long flush pauses in MMAPv1

    • Fully Compatible
    • ALL

      Reducing contention on GroupCommit mutex (SERVER-15950) exposed a problem in flushing code.
      The attached VTune lock profiler analysis shows that most of the actual lock wait contention (time-wise) is because of long flush times, as visible from the following callstacks:

      Waiting Call Stack
      1 of 7: 92.8% (1258.626s of 1355.788s)
      
      mongod!boost::condition_variable::timed_wait+0xf4 - condition_variable.hpp:74
      mongod!timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration,
      1000ll> >+0x296 - condition_variable_fwd.hpp:72
      mongod!mongo::CondVarLockGrantNotification::wait+0xa8 - lock_state.cpp:270
      mongod!mongo::LockerImpl<(bool)1>::lock+0x210 - lock_state.cpp:454
      mongod!mongo::LockerImpl<(bool)1>::endWriteUnitOfWork+0xfc - lock_state.cpp:407
      mongod!mongo::WriteUnitOfWork::commit+0x3b - operation_context.h:176
      mongod!singleInsert+0x1bf - batch_executor.cpp:1110
      mongod!insertOne+0x1a0 - batch_executor.cpp:1038
      mongod!mongo::WriteBatchExecutor::execOneInsert+0x5f - batch_executor.cpp:1063
      mongod!mongo::WriteBatchExecutor::execInserts+0x22c - batch_executor.cpp:860
      mongod!mongo::WriteBatchExecutor::bulkExecute+0x33 - batch_executor.cpp:754
      mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263
      mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143
      mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1160
      mongod!mongo::Command::execCommand+0xc50 - dbcommands.cpp:1374
      mongod!mongo::_runCommands+0x1ef - dbcommands.cpp:1450
      mongod!runCommands+0x23 - new_find.cpp:131
      mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
      mongod!receivedQuery+0x1f2 - instance.cpp:220
      mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
      mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
      mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 -
      message_server_port.cpp:234
      libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
      libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]
      
      Waiting Call Stack
      2 of 7: 7.1% (95.964s of 1355.788s)
      
      mongod!boost::condition_variable::timed_wait+0xf4 - condition_variable.hpp:74
      mongod!timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration,
      1000ll> >+0x296 - condition_variable_fwd.hpp:72
      mongod!mongo::CondVarLockGrantNotification::wait+0xa8 - lock_state.cpp:270
      mongod!mongo::LockerImpl<(bool)1>::lock+0x210 - lock_state.cpp:454
      mongod!mongo::LockerImpl<(bool)1>::lockGlobal+0x1ad - lock_state.cpp:338
      mongod!mongo::Lock::DBLock::lockDB+0x2c - d_concurrency.cpp:227
      mongod!mongo::Lock::DBLock::DBLock+0xcd - d_concurrency.cpp:217
      mongod!mongo::WriteBatchExecutor::ExecInsertsState::_lockAndCheckImpl+0x16d
      - batch_executor.cpp:954
      mongod!lockAndCheck+0x101 - batch_executor.cpp:1009
      mongod!insertOne+0x8f - batch_executor.cpp:1037
      mongod!mongo::WriteBatchExecutor::execOneInsert+0x5f - batch_executor.cpp:1063
      mongod!mongo::WriteBatchExecutor::execInserts+0x22c - batch_executor.cpp:860
      mongod!mongo::WriteBatchExecutor::bulkExecute+0x33 - batch_executor.cpp:754
      mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263
      mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143
      mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1160
      mongod!mongo::Command::execCommand+0xc50 - dbcommands.cpp:1374
      mongod!mongo::_runCommands+0x1ef - dbcommands.cpp:1450
      mongod!runCommands+0x23 - new_find.cpp:131
      mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
      mongod!receivedQuery+0x1f2 - instance.cpp:220
      mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
      mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
      mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 -
      message_server_port.cpp:234
      libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
      libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]
      

      This is not an actual lock contention, but just long flush times. kaloian.manassiev suggests this could be made better by flushing more often and some other small optimizations.

        1. flushlongpause.png
          466 kB
          Davide Italiano

            Assignee:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Reporter:
            davide.italiano Davide Italiano
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: