This performance regression affects readConcern "local" and "available" reads on secondary nodes.
SERVER-46721 removed a mutex around a critical section that effectively synchronized every external secondary reader that reads at lastApplied. I deemed this mutex unnecessary, but removing it pushed a synchronization problem down to a lower level.
For high volumes of short-lived secondary reads, it appears as though the WT reader-writer lock for the global read timestamp queue does not handle excessive contention as well as the mutex before it.
The problem I see is that the WT read timestamp queue leaves around old entries from inactive transactions. New readers (holding write locks on the read timestamp queue) are responsible for cleaning up old entries even if the queue has hundreds of thousands of inactive entries. This then blocks out other readers, which spin wait for a moment, then start context switching wildly. Once the queue shrinks down, thousands of new read requests come in, but the problem just repeats itself. This leads to very unpredicatable latencies and poor CPU utilization.
I was able to fix the performance problem by re-introducing a mutex around the area where we start transactions for secondary reads (at lastApplied):