[SERVER-51041] Throttle starting transactions for secondary reads Created: 18/Sep/20 Updated: 10/Jan/24 Resolved: 21/Sep/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.2.9, 4.4.1, 4.7.0 |
| Fix Version/s: | 4.8.0, 4.2.10, 4.4.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | KP44 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v4.4, v4.2
|
||||||||||||||||||||
| Sprint: | Execution Team 2020-10-05 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||
| Description |
|
This performance regression affects readConcern "local" and "available" reads on secondary nodes.
For high volumes of short-lived secondary reads, it appears as though the WT reader-writer lock for the global read timestamp queue does not handle excessive contention as well as the mutex before it. The problem I see is that the WT read timestamp queue leaves around old entries from inactive transactions. New readers (holding write locks on the read timestamp queue) are responsible for cleaning up old entries even if the queue has hundreds of thousands of inactive entries. This then blocks out other readers, which spin wait for a moment, then start context switching wildly. Once the queue shrinks down, thousands of new read requests come in, but the problem just repeats itself. This leads to very unpredicatable latencies and poor CPU utilization. I was able to fix the performance problem by re-introducing a mutex around the area where we start transactions for secondary reads (at lastApplied):
|
| Comments |
| Comment by Githook User [ 21/Sep/20 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: |
| Comment by Githook User [ 21/Sep/20 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: |
| Comment by Louis Williams [ 21/Sep/20 ] |
|
I don't believe there are any other areas of immediate concern regarding other operations that use point-in-time reads:
|
| Comment by Githook User [ 21/Sep/20 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: |
| Comment by Louis Williams [ 18/Sep/20 ] |
|
The reason why we don't see this issue for majority-committed reads is because they still take a mutex when starting new transactions. |