[SERVER-16102] Contention on mutex protecting canAcceptWritesForDatabase access with WT Created: 11/Nov/14  Updated: 15/Dec/14  Resolved: 19/Nov/14

Status: Closed
Project: Core Server
Component/s: Concurrency, Replication
Affects Version/s: None
Fix Version/s: 2.8.0-rc1

Type: Bug Priority: Major - P3
Reporter: Davide Italiano Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File replication_contention.png    
Issue Links:
Related
Operating System: ALL
Participants:

 Description   

Running sysbench and profiling with VTune, the lock protecting canAccpetWritesForDatabase resulted very contented, even if mongod is
run as standalone.

    bool ReplicationCoordinatorImpl::canAcceptWritesForDatabase(const StringData& dbName) {
        // we must check _settings since getReplicationMode() isn't aware of modeReplSet
        // until a valid replica set config has been loaded
        boost::lock_guard<boost::mutex> lk(_mutex);
        if (_settings.usingReplSets()) {
            if (_getReplicationMode_inlock() == modeReplSet &&
                    _getCurrentMemberState_inlock().primary() &&
                    !_isWaitingForDrainToComplete) {
                return true;
            }
            return dbName == "local";
        }

Code inspection shows up that, some functions, e.g. newRunQuery() call replication code unconditionally, i.e.

        // uassert if we are not on a primary, and not a secondary with SlaveOk query parameter set.
        bool slaveOK = pq.getOptions().slaveOk || pq.hasReadPref();
        status = repl::getGlobalReplicationCoordinator()->checkCanServeReadsFor(
                txn,
                NamespaceString(cq->ns()),
                slaveOK);
        uassertStatusOK(status);

Hence, during high load that mutex might become a bottleneck.
Vtune stacktraces, together with lock percentages:

Waiting Call Stack
1 of 7: 43.4% (493.236s of 1135.355s)
 
mongod!boost::mutex::lock+0x17 - mutex.hpp:59
mongod!lock_guard+0xb - locks.hpp:257
mongod!mongo::repl::ReplicationCoordinatorImpl::canAcceptWritesForDatabase+0xe - repl_coordinator_impl.cpp:1195
mongod!mongo::repl::ReplicationCoordinatorImpl::checkCanServeReadsFor+0x4a - repl_coordinator_impl.cpp:1227
mongod!mongo::newRunQuery+0x74d - new_find.cpp:672
mongod!receivedQuery+0x1f2 - instance.cpp:220
mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 - message_server_port.cpp:234
libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]

Waiting Call Stack
2 of 7: 23.0% (261.345s of 1135.355s)
 
mongod!boost::mutex::lock+0x17 - mutex.hpp:59
mongod!lock_guard+0xb - locks.hpp:257
mongod!mongo::repl::ReplicationCoordinatorImpl::canAcceptWritesForDatabase+0xe - repl_coordinator_impl.cpp:1195
mongod!mongo::Command::execCommand+0x5da - dbcommands.cpp:1319
mongod!mongo::_runCommands+0x221 - dbcommands.cpp:1464
mongod!runCommands+0x23 - new_find.cpp:131
mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
mongod!receivedQuery+0x1f2 - instance.cpp:220
mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 - message_server_port.cpp:234
libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]

Waiting Call Stack
3 of 7: 12.4% (140.897s of 1135.355s)
 
mongod!boost::mutex::lock+0x17 - mutex.hpp:59
mongod!lock_guard+0xb - locks.hpp:257
mongod!mongo::repl::ReplicationCoordinatorImpl::canAcceptWritesForDatabase+0xe - repl_coordinator_impl.cpp:1195
mongod!mongo::UpdateStage::restoreUpdateState+0xb4 - update.cpp:831
mongod!mongo::UpdateStage::restoreState+0x2f - update.cpp:858
mongod!mongo::UpdateStage::work+0x2b9 - update.cpp:786
mongod!mongo::PlanExecutor::getNext+0x99 - plan_executor.cpp:288
mongod!mongo::PlanExecutor::executePlan+0x3c - plan_executor.cpp:400
mongod!mongo::UpdateExecutor::execute+0x130 - update_executor.cpp:214
mongod!multiUpdate+0x498 - batch_executor.cpp:1273
mongod!mongo::WriteBatchExecutor::execUpdate+0x79 - batch_executor.cpp:880
mongod!mongo::WriteBatchExecutor::bulkExecute+0x1f1 - batch_executor.cpp:761
mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263
mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143
mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1174
mongod!mongo::Command::execCommand+0xca0 - dbcommands.cpp:1388
mongod!mongo::_runCommands+0x221 - dbcommands.cpp:1464
mongod!runCommands+0x23 - new_find.cpp:131
mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
mongod!receivedQuery+0x1f2 - instance.cpp:220
mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 - message_server_port.cpp:234
libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]

Waiting Call Stack
4 of 7: 7.5% (85.389s of 1135.355s)
 
mongod!boost::mutex::lock+0x17 - mutex.hpp:59
mongod!lock_guard+0xb - locks.hpp:257
mongod!mongo::repl::ReplicationCoordinatorImpl::canAcceptWritesForDatabase+0xe - repl_coordinator_impl.cpp:1195
mongod!mongo::UpdateExecutor::prepareInLock+0x2df - update_executor.cpp:154
mongod!mongo::UpdateExecutor::execute+0xfe - update_executor.cpp:206
mongod!multiUpdate+0x498 - batch_executor.cpp:1273
mongod!mongo::WriteBatchExecutor::execUpdate+0x79 - batch_executor.cpp:880
mongod!mongo::WriteBatchExecutor::bulkExecute+0x1f1 - batch_executor.cpp:761
mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263
mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143
mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1174
mongod!mongo::Command::execCommand+0xca0 - dbcommands.cpp:1388
mongod!mongo::_runCommands+0x221 - dbcommands.cpp:1464
mongod!runCommands+0x23 - new_find.cpp:131
mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
mongod!receivedQuery+0x1f2 - instance.cpp:220
mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 - message_server_port.cpp:234
libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]

Waiting Call Stack
5 of 7: 5.8% (66.230s of 1135.355s)
 
mongod!boost::mutex::lock+0x17 - mutex.hpp:59
mongod!lock_guard+0xb - locks.hpp:257
mongod!mongo::repl::ReplicationCoordinatorImpl::canAcceptWritesForDatabase+0xe - repl_coordinator_impl.cpp:1195
mongod!mongo::DeleteStage::restoreState+0x7c - delete.cpp:172
mongod!mongo::DeleteStage::work+0x28e - delete.cpp:131
mongod!mongo::PlanExecutor::getNext+0x99 - plan_executor.cpp:288
mongod!mongo::PlanExecutor::executePlan+0x3c - plan_executor.cpp:400
mongod!mongo::DeleteExecutor::execute+0x59 - delete_executor.cpp:185
mongod!multiRemove+0x267 - batch_executor.cpp:1358
mongod!mongo::WriteBatchExecutor::execRemove+0x74 - batch_executor.cpp:909
mongod!mongo::WriteBatchExecutor::bulkExecute+0xf4 - batch_executor.cpp:782
mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263
mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143
mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1174
mongod!mongo::Command::execCommand+0xca0 - dbcommands.cpp:1388
mongod!mongo::_runCommands+0x221 - dbcommands.cpp:1464
mongod!runCommands+0x23 - new_find.cpp:131
mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
mongod!receivedQuery+0x1f2 - instance.cpp:220
mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 - message_server_port.cpp:234
libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]

Waiting Call Stack
6 of 7: 3.9% (44.155s of 1135.355s)
 
mongod!boost::mutex::lock+0x17 - mutex.hpp:59
mongod!lock_guard+0xb - locks.hpp:257
mongod!mongo::repl::ReplicationCoordinatorImpl::canAcceptWritesForDatabase+0xe - repl_coordinator_impl.cpp:1195
mongod!checkIsMasterForDatabase+0x7e - batch_executor.cpp:461
mongod!mongo::WriteBatchExecutor::ExecInsertsState::_lockAndCheckImpl+0x2f4 - batch_executor.cpp:966
mongod!lockAndCheck+0x119 - batch_executor.cpp:1009
mongod!insertOne+0x8f - batch_executor.cpp:1037
mongod!mongo::WriteBatchExecutor::execOneInsert+0x5f - batch_executor.cpp:1065
mongod!mongo::WriteBatchExecutor::execInserts+0x22c - batch_executor.cpp:860
mongod!mongo::WriteBatchExecutor::bulkExecute+0x33 - batch_executor.cpp:754
mongod!mongo::WriteBatchExecutor::executeBatch+0x3a4 - batch_executor.cpp:263
mongod!mongo::WriteCmd::run+0x168 - write_commands.cpp:143
mongod!mongo::_execCommand+0x33 - dbcommands.cpp:1174
mongod!mongo::Command::execCommand+0xca0 - dbcommands.cpp:1388
mongod!mongo::_runCommands+0x221 - dbcommands.cpp:1464
mongod!runCommands+0x23 - new_find.cpp:131
mongod!mongo::newRunQuery+0xff9 - new_find.cpp:551
mongod!receivedQuery+0x1f2 - instance.cpp:220
mongod!mongo::assembleResponse+0x9d0 - instance.cpp:393
mongod!mongo::MyMessageHandler::process+0xdf - db.cpp:185
mongod!mongo::PortMessageServer::handleIncomingMsg+0x420 - message_server_port.cpp:234
libpthread-2.18.so!start_thread+0xc2 - [Unknown]:[Unknown]
libc-2.18.so!__clone+0x6c - [Unknown]:[Unknown]



 Comments   
Comment by Githook User [ 19/Nov/14 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-16102 Get repl mutex out of critical path for secondary slaveok reads.
Branch: master
https://github.com/mongodb/mongo/commit/c6653bd908a98c3b98ca14a6769cb52da1ff0d19

Comment by Githook User [ 13/Nov/14 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-16102 Correct comment about meaning of (GX) in repl_coordinator_impl.h
Branch: master
https://github.com/mongodb/mongo/commit/056b9a0e411e83e19577ff814ac65eb677b05ae0

Comment by Githook User [ 13/Nov/14 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-16102 Adjust ReplCoordinatorImpl::canAcceptWritesForDatabase() to not require a mutex.

Because callers of canAcceptWritesForDatabase must already hold the global lock
in at least an intent mode, and because a node cannot transition between
write-accepting and not-write-accepting without holding the global lock in
MODE_S or MODE_X, we can arrange to set a flag indicating the answer to the
canAcceptWritesForDatabase question that may be consulted without taking the
coordinator's mutex or running in the executor.

This removes synchronization on the replication system from the write critical
path, and from the read path on primaries. Future work will remove it from the
read critical path on secondaries.
Branch: master
https://github.com/mongodb/mongo/commit/c93641756a1d8444876b9b79c3aa9a9635626891

Generated at Thu Feb 08 03:39:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.