Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
Fully Compatible
-
ALL
Description
Reducing contention on GroupCommit mutex (SERVER-15950) exposed a problem in flushing code.
The attached VTune lock profiler analysis shows that most of the actual lock wait contention (time-wise) is because of long flush times, as visible from the following callstacks:
Waiting Call Stack
|
1 of 7: 92.8% (1258.626s of 1355.788s)
|
|
|
mongod!boost::condition_variable::timed_wait+0xf4 - condition_variable.hpp:74
|
mongod!timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration,
|
1000ll> >+0x296 - condition_variable_fwd.hpp:72
|
mongod!mongo::CondVarLockGrantNotification::wait+0xa8 - lock_state.cpp:270
|
mongod!mongo::LockerImpl<(bool)1>::lock+0x210 - lock_state.cpp:454
|
mongod!mongo::LockerImpl<(bool)1>::endWriteUnitOfWork+0xfc - lock_state.cpp:407
|
mongod!mongo::WriteUnitOfWork::commit+0x3b - operation_context.h:176
|
mongod!singleInsert+0x1bf - batch_executor.cpp:1110
|
mongod!insertOne+0x1a0 - batch_executor.cpp:1038
|
mongod!mongo::WriteBatchExecutor::execOneInsert+0x5f - batch_executor.cpp:1063
|
mongod!mongo::WriteBatchExecutor::execInserts+0x22c - batch_executor.cpp:860
|
mongod!mongo::WriteBatchExecutor::bulkExecute+0x33 - batch_executor.cpp:754
|
mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263
|
mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143
|
mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1160
|
mongod!mongo::Command::execCommand+0xc50 - dbcommands.cpp:1374
|
mongod!mongo::_runCommands+0x1ef - dbcommands.cpp:1450
|
mongod!runCommands+0x23 - new_find.cpp:131
|
mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
|
mongod!receivedQuery+0x1f2 - instance.cpp:220
|
mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
|
mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
|
mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 -
|
message_server_port.cpp:234
|
libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
|
libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]
|
Waiting Call Stack
|
2 of 7: 7.1% (95.964s of 1355.788s)
|
|
|
mongod!boost::condition_variable::timed_wait+0xf4 - condition_variable.hpp:74
|
mongod!timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration,
|
1000ll> >+0x296 - condition_variable_fwd.hpp:72
|
mongod!mongo::CondVarLockGrantNotification::wait+0xa8 - lock_state.cpp:270
|
mongod!mongo::LockerImpl<(bool)1>::lock+0x210 - lock_state.cpp:454
|
mongod!mongo::LockerImpl<(bool)1>::lockGlobal+0x1ad - lock_state.cpp:338
|
mongod!mongo::Lock::DBLock::lockDB+0x2c - d_concurrency.cpp:227
|
mongod!mongo::Lock::DBLock::DBLock+0xcd - d_concurrency.cpp:217
|
mongod!mongo::WriteBatchExecutor::ExecInsertsState::_lockAndCheckImpl+0x16d
|
- batch_executor.cpp:954
|
mongod!lockAndCheck+0x101 - batch_executor.cpp:1009
|
mongod!insertOne+0x8f - batch_executor.cpp:1037
|
mongod!mongo::WriteBatchExecutor::execOneInsert+0x5f - batch_executor.cpp:1063
|
mongod!mongo::WriteBatchExecutor::execInserts+0x22c - batch_executor.cpp:860
|
mongod!mongo::WriteBatchExecutor::bulkExecute+0x33 - batch_executor.cpp:754
|
mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263
|
mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143
|
mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1160
|
mongod!mongo::Command::execCommand+0xc50 - dbcommands.cpp:1374
|
mongod!mongo::_runCommands+0x1ef - dbcommands.cpp:1450
|
mongod!runCommands+0x23 - new_find.cpp:131
|
mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
|
mongod!receivedQuery+0x1f2 - instance.cpp:220
|
mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
|
mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
|
mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 -
|
message_server_port.cpp:234
|
libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
|
libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]
|
This is not an actual lock contention, but just long flush times. kaloian.manassiev suggests this could be made better by flushing more often and some other small optimizations.
Attachments
Issue Links
- is duplicated by
-
SERVER-16482 Decreased throughput due to changes in scheduling algorithm? (mms-prod-pings1)
-
- Closed
-
- is related to
-
SERVER-16067 Contention on lockmgr buckets in MMAPv1
-
- Closed
-