-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Replication
The goal of this idea is to reduce ReplicationCoordinator mutex contention.
ReplicationCoordinatorImpl (RCI) wraps all calls to its TopologyCoordinator (TC) in RCI._mutex. This is necessary today because TC lacks its own mutex. Therefore, all members in TC are synchronized by RCI.
In a simple workload (PSS: write 1k documents one-by-one, then read them back one-by-one), the following are the top 10 functions in RCI that take RCI._mutex on the primary:
$ egrep "^src" node1.mutex.log | cut -d":" -f1,2 | sort | uniq -c | sort -nr | grep replication_coordinator_impl.cpp | head 13828 src/mongo/db/repl/replication_coordinator_impl.cpp:getMyLastAppliedOpTime 10455 src/mongo/db/repl/replication_coordinator_impl.cpp:prepareReplMetadata 7483 src/mongo/db/repl/replication_coordinator_impl.cpp:getMemberState 6086 src/mongo/db/repl/replication_coordinator_impl.cpp:processReplSetUpdatePosition 4127 src/mongo/db/repl/replication_coordinator_impl.cpp:setMyLastDurableAndLastWrittenOpTimeAndWallTimeForward 2189 src/mongo/db/repl/replication_coordinator_impl.cpp:canAcceptNonLocalWrites 1972 src/mongo/db/repl/replication_coordinator_impl.cpp:populateUnsetWriteConcernOptionsSyncMode 1960 src/mongo/db/repl/replication_coordinator_impl.cpp:operator() 1038 src/mongo/db/repl/replication_coordinator_impl.cpp:setMyLastAppliedAndLastWrittenOpTimeAndWallTimeForward 1024 src/mongo/db/repl/replication_coordinator_impl.cpp:getHelloResponseFuture
Here is the same data from a secondary:
$ egrep "^src" node2.mutex.log | cut -d":" -f1,2 | sort | uniq -c | sort -nr | grep replication_coordinator_impl.cpp | head 4482 src/mongo/db/repl/replication_coordinator_impl.cpp:getMemberState 3396 src/mongo/db/repl/replication_coordinator_impl.cpp:getMyLastWrittenOpTimeAndWallTime 3395 src/mongo/db/repl/replication_coordinator_impl.cpp:setMyLastDurableOpTimeAndWallTimeForward 3076 src/mongo/db/repl/replication_coordinator_impl.cpp:prepareReplSetUpdatePositionCommand 3075 src/mongo/db/repl/replication_coordinator_impl.cpp:getMyLastAppliedOpTime 2472 src/mongo/db/repl/replication_coordinator_impl.cpp:getOplogSyncState 2068 src/mongo/db/repl/replication_coordinator_impl.cpp:advanceCommitPoint 2067 src/mongo/db/repl/replication_coordinator_impl.cpp:shouldChangeSyncSource 2067 src/mongo/db/repl/replication_coordinator_impl.cpp:cancelAndRescheduleElectionTimeout 2032 src/mongo/db/repl/replication_coordinator_impl.cpp:setMyLastWrittenOpTimeAndWallTimeForward
Some of these, like getMyLastAppliedOpTime() and getMemberState(), only call the analogous accessors in TC. If TC had its own mutex, then it would not be necessary to take RCI._mutex in those RCI accessors. Taking RCI._mutex in fewer places should reduce contention and improve performance.
Some concerns:
RCI::doThing() { // 0 TC.updateFoo(); // 1 TC.updateBar(TC.getFoo()); // 2 } // 3
This code might race: between lines 1 and 2, there might be another call to updateFoo(), so we might call updateBar() with an unexpected value. We can prevent this by holding RCI._mutex when calling any function that writes to any member of TC. An even better solution would be to have RCI::doThing() delegate (without taking RCI._mutex) to an analogous function in TC that took TC._mutex.
RCI::getFooAndBar() { // 0 auto foo = TC.getFoo(); // 1 auto bar = TC.getBar(); // 2 std::pair<Foo, Bar> p(foo, bar); // 3 return p; // 4 } // 5
This code might race: between lines 1 and 2, there might be a call to either updateFoo() or updateBar(). If there is an assumed relationship between foo and bar - that is, if they are assumed to be modified together - then p would contain mismatched values. We can prevent this by holding RCI._mutex in getThing() in conjunction with holding RCI._mutex when calling any function that writes to any member of TC. An even better solution would be to have RCI.getFooAndBar() delegate (without taking RCI._mutex) to an analogous function in TC that took TC._mutex.
A POC of this idea shows 10-20% performance improvements in several perf tests and a single 20% regression.
- related to
-
SERVER-27083 Remove ReplicationCoordinatorImpl::_topoMutex
- Closed