-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Concurrency, Storage
-
Fully Compatible
-
ALL
Reducing contention on GroupCommit mutex (SERVER-15950) exposed a problem in flushing code.
The attached VTune lock profiler analysis shows that most of the actual lock wait contention (time-wise) is because of long flush times, as visible from the following callstacks:
Waiting Call Stack 1 of 7: 92.8% (1258.626s of 1355.788s) mongod!boost::condition_variable::timed_wait+0xf4 - condition_variable.hpp:74 mongod!timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000ll> >+0x296 - condition_variable_fwd.hpp:72 mongod!mongo::CondVarLockGrantNotification::wait+0xa8 - lock_state.cpp:270 mongod!mongo::LockerImpl<(bool)1>::lock+0x210 - lock_state.cpp:454 mongod!mongo::LockerImpl<(bool)1>::endWriteUnitOfWork+0xfc - lock_state.cpp:407 mongod!mongo::WriteUnitOfWork::commit+0x3b - operation_context.h:176 mongod!singleInsert+0x1bf - batch_executor.cpp:1110 mongod!insertOne+0x1a0 - batch_executor.cpp:1038 mongod!mongo::WriteBatchExecutor::execOneInsert+0x5f - batch_executor.cpp:1063 mongod!mongo::WriteBatchExecutor::execInserts+0x22c - batch_executor.cpp:860 mongod!mongo::WriteBatchExecutor::bulkExecute+0x33 - batch_executor.cpp:754 mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263 mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143 mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1160 mongod!mongo::Command::execCommand+0xc50 - dbcommands.cpp:1374 mongod!mongo::_runCommands+0x1ef - dbcommands.cpp:1450 mongod!runCommands+0x23 - new_find.cpp:131 mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551 mongod!receivedQuery+0x1f2 - instance.cpp:220 mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393 mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185 mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 - message_server_port.cpp:234 libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown] libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]
Waiting Call Stack 2 of 7: 7.1% (95.964s of 1355.788s) mongod!boost::condition_variable::timed_wait+0xf4 - condition_variable.hpp:74 mongod!timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000ll> >+0x296 - condition_variable_fwd.hpp:72 mongod!mongo::CondVarLockGrantNotification::wait+0xa8 - lock_state.cpp:270 mongod!mongo::LockerImpl<(bool)1>::lock+0x210 - lock_state.cpp:454 mongod!mongo::LockerImpl<(bool)1>::lockGlobal+0x1ad - lock_state.cpp:338 mongod!mongo::Lock::DBLock::lockDB+0x2c - d_concurrency.cpp:227 mongod!mongo::Lock::DBLock::DBLock+0xcd - d_concurrency.cpp:217 mongod!mongo::WriteBatchExecutor::ExecInsertsState::_lockAndCheckImpl+0x16d - batch_executor.cpp:954 mongod!lockAndCheck+0x101 - batch_executor.cpp:1009 mongod!insertOne+0x8f - batch_executor.cpp:1037 mongod!mongo::WriteBatchExecutor::execOneInsert+0x5f - batch_executor.cpp:1063 mongod!mongo::WriteBatchExecutor::execInserts+0x22c - batch_executor.cpp:860 mongod!mongo::WriteBatchExecutor::bulkExecute+0x33 - batch_executor.cpp:754 mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263 mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143 mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1160 mongod!mongo::Command::execCommand+0xc50 - dbcommands.cpp:1374 mongod!mongo::_runCommands+0x1ef - dbcommands.cpp:1450 mongod!runCommands+0x23 - new_find.cpp:131 mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551 mongod!receivedQuery+0x1f2 - instance.cpp:220 mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393 mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185 mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 - message_server_port.cpp:234 libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown] libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]
This is not an actual lock contention, but just long flush times. kaloian.manassiev suggests this could be made better by flushing more often and some other small optimizations.
- is duplicated by
-
SERVER-16482 Decreased throughput due to changes in scheduling algorithm? (mms-prod-pings1)
- Closed
- is related to
-
SERVER-16067 Contention on lockmgr buckets in MMAPv1
- Closed