[SERVER-35103] Checkpoint creates unevictable clean content Created: 19/May/18 Updated: 23/Aug/18 Resolved: 23/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 3.6.3, 3.6.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Bruce Lucas (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | nonnyc, storage-engines | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Storage Non-NYC 2018-06-04 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Attached repro script does the following
Above was run on a machine with 24 cpus and a lot of memory, but the script uses numactl to limit each instance to two cpus and cache to 4 GB (during setup) and 1 GB (during recovery), so this should be runnable on a more modest machine. |
| Comments |
| Comment by Gregory McKeon (Inactive) [ 23/Aug/18 ] |
|
bruce.lucas was there a commit on this ticket? If so, can you link it? I don't see one in the codebase, so I'm setting the resolution as "Done". |
| Comment by Alexander Gorrod [ 29/May/18 ] |
|
I ran the workload attached to this ticket against the 3.6.3 and 3.6.5 versions of MongoDB, and I can reproduce the behavior you describe with the earlier version but not the latter. The following picture demonstrates the difference: You can see that the newer version shows the same saw pattern of cache usage related to checkpoints, but the total cache usage never goes above 82% - which is the threshold for the eviction server starting to work on making more space available. The dirty fill ratio is steady at 20% (max) for almost the entire run. I tried an experiment with the latest version of the 3.6 code, where I allowed a higher proportion of the cache to be dirty via --wiredTigerEngineConfigString=eviction_dirty_trigger=75,eviction_dirty_target=50, with the theory that if clean content were actually pinned we would see the cache get stuck. I didn't observe that - the workload effectively continued even when there was less space for clean content in the cache: I also added a new statistic during this investigation which tracks the case michael.cahill was concerned we had wrong - which was that WiredTiger might not consider clean pages with history for eviction (even though it can use the lookaside mechanism to reclaim the space). That statistic is growing in all of these workloads - which indicates that those pages would be chosen for eviction if there was enough cache pressure. Given that the behavior is improved in more recent versions, and the particular code that was of concern is working properly I don't think there is anything more to do here. I'm going to assign this back to you bruce.lucas rather than closing outright, in case you'd like more information or a different follow up. |
| Comment by Alexander Gorrod [ 25/May/18 ] |
|
I have done some investigation into the behavior described and can't see any changes in WiredTiger that have been done since 3.6.4 that are likely to fix this issue. I am running the reproducer and will do more analysis into how we can mitigate the behavior. |