Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-51041

Throttle starting transactions for secondary reads

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.8.0, 4.2.10, 4.4.2
    • Affects Version/s: 4.2.9, 4.4.1, 4.7.0
    • Component/s: None
    • Labels:
    • Fully Compatible
    • ALL
    • v4.4, v4.2
    • Execution Team 2020-10-05
    • 0

      This performance regression affects readConcern "local" and "available" reads on secondary nodes.

      SERVER-46721 removed a mutex around a critical section that effectively synchronized every external secondary reader that reads at lastApplied. I deemed this mutex unnecessary, but removing it pushed a synchronization problem down to a lower level.

      For high volumes of short-lived secondary reads, it appears as though the WT reader-writer lock for the global read timestamp queue does not handle excessive contention as well as the mutex before it.

      The problem I see is that the WT read timestamp queue leaves around old entries from inactive transactions. New readers (holding write locks on the read timestamp queue) are responsible for cleaning up old entries even if the queue has hundreds of thousands of inactive entries. This then blocks out other readers, which spin wait for a moment, then start context switching wildly. Once the queue shrinks down, thousands of new read requests come in, but the problem just repeats itself. This leads to very unpredicatable latencies and poor CPU utilization.

      I was able to fix the performance problem by re-introducing a mutex around the area where we start transactions for secondary reads (at lastApplied):

      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp
      index 3f07c244c5..bcdef7f70c 100644
      --- a/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp
      +++ b/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp
      @@ -590,6 +590,7 @@ Timestamp WiredTigerRecoveryUnit::_beginTransactionAtAllDurableTimestamp(WT_SESS
           return readTimestamp;
      +Mutex _lastAppliedTxnMutex = MONGO_MAKE_LATCH("_lastAppliedTxnMutex");
       Timestamp WiredTigerRecoveryUnit::_beginTransactionAtLastAppliedTimestamp(WT_SESSION* session) {
           auto lastApplied = _sessionCache->snapshotManager().getLastApplied();
           if (!lastApplied) {
      @@ -609,6 +610,8 @@ Timestamp WiredTigerRecoveryUnit::_beginTransactionAtLastAppliedTimestamp(WT_SES
               return Timestamp();
      +    stdx::lock_guard<Latch> lock(_lastAppliedTxnMutex);
           WiredTigerBeginTxnBlock txnOpen(session,

            louis.williams@mongodb.com Louis Williams
            louis.williams@mongodb.com Louis Williams
            0 Vote for this issue
            20 Start watching this issue