[SERVER-51041] Throttle starting transactions for secondary reads Created: 18/Sep/20  Updated: 10/Jan/24  Resolved: 21/Sep/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.9, 4.4.1, 4.7.0
Fix Version/s: 4.8.0, 4.2.10, 4.4.2

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: KP44
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to WT-6709 Remove timestamp queues that used to ... Closed
is related to SERVER-55030 Remove mutexes that serialize seconda... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2
Sprint: Execution Team 2020-10-05
Participants:
Case:
Linked BF Score: 0

 Description   

This performance regression affects readConcern "local" and "available" reads on secondary nodes.

SERVER-46721 removed a mutex around a critical section that effectively synchronized every external secondary reader that reads at lastApplied. I deemed this mutex unnecessary, but removing it pushed a synchronization problem down to a lower level.

For high volumes of short-lived secondary reads, it appears as though the WT reader-writer lock for the global read timestamp queue does not handle excessive contention as well as the mutex before it.

The problem I see is that the WT read timestamp queue leaves around old entries from inactive transactions. New readers (holding write locks on the read timestamp queue) are responsible for cleaning up old entries even if the queue has hundreds of thousands of inactive entries. This then blocks out other readers, which spin wait for a moment, then start context switching wildly. Once the queue shrinks down, thousands of new read requests come in, but the problem just repeats itself. This leads to very unpredicatable latencies and poor CPU utilization.

I was able to fix the performance problem by re-introducing a mutex around the area where we start transactions for secondary reads (at lastApplied):

diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp
index 3f07c244c5..bcdef7f70c 100644
--- a/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp
+++ b/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp
@@ -590,6 +590,7 @@ Timestamp WiredTigerRecoveryUnit::_beginTransactionAtAllDurableTimestamp(WT_SESS
     return readTimestamp;
 }
 
+Mutex _lastAppliedTxnMutex = MONGO_MAKE_LATCH("_lastAppliedTxnMutex");
 Timestamp WiredTigerRecoveryUnit::_beginTransactionAtLastAppliedTimestamp(WT_SESSION* session) {
     auto lastApplied = _sessionCache->snapshotManager().getLastApplied();
     if (!lastApplied) {
@@ -609,6 +610,8 @@ Timestamp WiredTigerRecoveryUnit::_beginTransactionAtLastAppliedTimestamp(WT_SES
         return Timestamp();
     }
 
+
+    stdx::lock_guard<Latch> lock(_lastAppliedTxnMutex);
     WiredTigerBeginTxnBlock txnOpen(session,
                                     _prepareConflictBehavior,
                                     _roundUpPreparedTimestamps,



 Comments   
Comment by Githook User [ 21/Sep/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-51041 Throttle starting transactions for secondary reads
Branch: v4.2
https://github.com/mongodb/mongo/commit/6c3fffb3e82f3208414a747df4fb9ae9ab4e2f52

Comment by Githook User [ 21/Sep/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-51041 Throttle starting transactions for secondary reads
Branch: v4.4
https://github.com/mongodb/mongo/commit/63a7f9c895de00f6bd440480f943608dfb2670f6

Comment by Louis Williams [ 21/Sep/20 ]

I don't believe there are any other areas of immediate concern regarding other operations that use point-in-time reads:

  • readConcern "snapshot": this is designed for multi-document transactions and long-running snapshot reads. By nature, these operations should not be starting new transactions as frequently as local secondary reads
  • readConcern "atClusterTime": this is used for sharded PIT reads and could also be a victim of the underlying performance issue, but the perf is likely domiated by other bottlenecks in the network and sharding machinery
  • readConcern "linearizable": very likely dominated by other bottlenecks including waiting for replication
  • readConcern "majority": see my previous comment

 

Comment by Githook User [ 21/Sep/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-51041 Throttle starting transactions for secondary reads
Branch: master
https://github.com/mongodb/mongo/commit/1f18dc5ce618f61e54d2ac203cbf16b8d388c862

Comment by Louis Williams [ 18/Sep/20 ]

The reason why we don't see this issue for majority-committed reads is because they still take a mutex when starting new transactions.

Generated at Thu Feb 08 05:24:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.