[SERVER-36495] Cache pressure issues during recovery oplog application Created: 07/Aug/18 Updated: 29/Oct/23 Resolved: 29/Apr/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Fixed | Votes: | 6 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
During recovery oplog application we don't advance the oldest timestamp. This can pin a lot of data in the cache. Resulting symptoms include
|
| Comments |
| Comment by Kelsey Schubert [ 29/Apr/22 ] |
|
This ticket filed describes a number of symptoms associated with cache pressure during oplog recovery. It does not describe a particular fix, which is why it has been open for a while as worked on a number of changes that would improve performance. There are two elements at play here, one is whether we can advance the oldest timestamp during recovery, this is very challenging to do safely and would be a significant change to the system architecture. Instead, we choose to focus on the second element which is how we perform when the timestamp lags. This change has significant benefits including during steady-state operation. We have invested around 30 engineering years between 4.2 and 4.4 to tackle this problem, resulting in significant improvement to throughput. However, given the scale of the change, it’s not safe or feasible to backport MongoDB 4.2. In MongoDB 4.2, we need to maintain the history of updates since the oldest timestamp in memory. If this history exceeds our cache, we use a cache overflow mechanism that functions similar to swap. As a consequence, performance may degrade - this is the expected behavior. In MongoDB 4.4, we introduced durable history, which allows us to more efficiently write out historical information associated with pages in cache, allowing us to reduce cache pressure and provides a significant performance boost. In some tests, we see a 10x improvement to write throughput during cache pressure. As the symptoms of described by this ticket have been significantly improved by architectural changes in MongoDB 4.4 and later, I'm going to resolve this ticket. |