Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-100187

Give TopologyCoordinator its own mutex

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • 8.2 Required
    • Affects Version/s: None
    • Component/s: None
    • Replication

      The goal of this idea is to reduce ReplicationCoordinator mutex contention.

      ReplicationCoordinatorImpl (RCI) wraps all calls to its TopologyCoordinator (TC) in RCI._mutex. This is necessary today because TC lacks its own mutex. Therefore, all members in TC are synchronized by RCI.

      In a simple workload (PSS: write 1k documents one-by-one, then read them back one-by-one), the following are the top 10 functions in RCI that take RCI._mutex on the primary:

      $ egrep "^src" node1.mutex.log | cut -d":" -f1,2 | sort | uniq -c | sort -nr | grep replication_coordinator_impl.cpp | head
        13828 src/mongo/db/repl/replication_coordinator_impl.cpp:getMyLastAppliedOpTime
        10455 src/mongo/db/repl/replication_coordinator_impl.cpp:prepareReplMetadata
         7483 src/mongo/db/repl/replication_coordinator_impl.cpp:getMemberState
         6086 src/mongo/db/repl/replication_coordinator_impl.cpp:processReplSetUpdatePosition
         4127 src/mongo/db/repl/replication_coordinator_impl.cpp:setMyLastDurableAndLastWrittenOpTimeAndWallTimeForward
         2189 src/mongo/db/repl/replication_coordinator_impl.cpp:canAcceptNonLocalWrites
         1972 src/mongo/db/repl/replication_coordinator_impl.cpp:populateUnsetWriteConcernOptionsSyncMode
         1960 src/mongo/db/repl/replication_coordinator_impl.cpp:operator()
         1038 src/mongo/db/repl/replication_coordinator_impl.cpp:setMyLastAppliedAndLastWrittenOpTimeAndWallTimeForward
         1024 src/mongo/db/repl/replication_coordinator_impl.cpp:getHelloResponseFuture

      Here is the same data from a secondary:

      $ egrep "^src" node2.mutex.log | cut -d":" -f1,2 | sort | uniq -c | sort -nr | grep replication_coordinator_impl.cpp | head
         4482 src/mongo/db/repl/replication_coordinator_impl.cpp:getMemberState
         3396 src/mongo/db/repl/replication_coordinator_impl.cpp:getMyLastWrittenOpTimeAndWallTime
         3395 src/mongo/db/repl/replication_coordinator_impl.cpp:setMyLastDurableOpTimeAndWallTimeForward
         3076 src/mongo/db/repl/replication_coordinator_impl.cpp:prepareReplSetUpdatePositionCommand
         3075 src/mongo/db/repl/replication_coordinator_impl.cpp:getMyLastAppliedOpTime
         2472 src/mongo/db/repl/replication_coordinator_impl.cpp:getOplogSyncState
         2068 src/mongo/db/repl/replication_coordinator_impl.cpp:advanceCommitPoint
         2067 src/mongo/db/repl/replication_coordinator_impl.cpp:shouldChangeSyncSource
         2067 src/mongo/db/repl/replication_coordinator_impl.cpp:cancelAndRescheduleElectionTimeout
         2032 src/mongo/db/repl/replication_coordinator_impl.cpp:setMyLastWrittenOpTimeAndWallTimeForward

      Some of these, like getMyLastAppliedOpTime() and getMemberState(), only call the analogous accessors in TC. If TC had its own mutex, then it would not be necessary to take RCI._mutex in those RCI accessors. Taking RCI._mutex in fewer places should reduce contention and improve performance.

      Some concerns:

      RCI::doThing() { // 0
        TC.updateFoo(); // 1
        TC.updateBar(TC.getFoo()); // 2
      } // 3

      This code might race: between lines 1 and 2, there might be another call to updateFoo(), so we might call updateBar() with an unexpected value. We can prevent this by holding RCI._mutex when calling any function that writes to any member of TC. An even better solution would be to have RCI::doThing() delegate (without taking RCI._mutex) to an analogous function in TC that took TC._mutex.

      RCI::getFooAndBar() { // 0
        auto foo = TC.getFoo(); // 1
        auto bar = TC.getBar(); // 2
        std::pair<Foo, Bar> p(foo, bar); // 3
        return p; // 4
      } // 5

      This code might race: between lines 1 and 2, there might be a call to either updateFoo() or updateBar(). If there is an assumed relationship between foo and bar - that is, if they are assumed to be modified together - then p would contain mismatched values. We can prevent this by holding RCI._mutex in getThing() in conjunction with holding RCI._mutex when calling any function that writes to any member of TC. An even better solution would be to have RCI.getFooAndBar() delegate (without taking RCI._mutex) to an analogous function in TC that took TC._mutex.

      A POC of this idea shows 10-20% performance improvements in several perf tests and a single 20% regression.

            Assignee:
            Unassigned Unassigned
            Reporter:
            brad.cater@mongodb.com Brad Cater
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: