[SERVER-35103] Checkpoint creates unevictable clean content Created: 19/May/18  Updated: 23/Aug/18  Resolved: 23/Aug/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 3.6.3, 3.6.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Bruce Lucas (Inactive)
Resolution: Done Votes: 0
Labels: nonnyc, storage-engines
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File metrics.2018-05-19T13-40-35Z-00000     PNG File repro.png     File repro.sh     PNG File s35103.png     PNG File s35103_high_dirty.png    
Issue Links:
Depends
Related
is related to SERVER-33191 Cache-full hangs on 3.6 Closed
is related to SERVER-34938 Secondary slowdown or hang due to con... Closed
Operating System: ALL
Sprint: Storage Non-NYC 2018-06-04
Participants:

 Description   

Attached repro script does the following

  • Inserts 1M documents with 10 indexes over fields that are random strings
  • Stops one secondary then removes all documents via one of the indexes, therefore in random order.
  • Restarts that secondary. While it is catching up (after the script completes), due to SERVER-34938, batches will be large and take several minutes to complete, so multiple checkpoints will run while timestamps are pinning content in cache. It appears that additional unevictable clean content is generated at each checkpoint, exacerbating the effects of SERVER-34938.

  • batch boundaries are at A, E, F
  • at each checkpoint (e.g B, C, D) the amount of clean content is increased, and this is presumably unevictable because the oldest timestamp is only updated between batches
  • at batch boundaries when oldest timestamp is advanced the cache content is reduced, presumably because it becomes evictable

Above was run on a machine with 24 cpus and a lot of memory, but the script uses numactl to limit each instance to two cpus and cache to 4 GB (during setup) and 1 GB (during recovery), so this should be runnable on a more modest machine.



 Comments   
Comment by Gregory McKeon (Inactive) [ 23/Aug/18 ]

bruce.lucas was there a commit on this ticket? If so, can you link it? I don't see one in the codebase, so I'm setting the resolution as "Done".

Comment by Alexander Gorrod [ 29/May/18 ]

I ran the workload attached to this ticket against the 3.6.3 and 3.6.5 versions of MongoDB, and I can reproduce the behavior you describe with the earlier version but not the latter. The following picture demonstrates the difference:

You can see that the newer version shows the same saw pattern of cache usage related to checkpoints, but the total cache usage never goes above 82% - which is the threshold for the eviction server starting to work on making more space available. The dirty fill ratio is steady at 20% (max) for almost the entire run.

I tried an experiment with the latest version of the 3.6 code, where I allowed a higher proportion of the cache to be dirty via --wiredTigerEngineConfigString=eviction_dirty_trigger=75,eviction_dirty_target=50, with the theory that if clean content were actually pinned we would see the cache get stuck. I didn't observe that - the workload effectively continued even when there was less space for clean content in the cache:

I also added a new statistic during this investigation which tracks the case michael.cahill was concerned we had wrong - which was that WiredTiger might not consider clean pages with history for eviction (even though it can use the lookaside mechanism to reclaim the space). That statistic is growing in all of these workloads - which indicates that those pages would be chosen for eviction if there was enough cache pressure.

Given that the behavior is improved in more recent versions, and the particular code that was of concern is working properly I don't think there is anything more to do here. I'm going to assign this back to you bruce.lucas rather than closing outright, in case you'd like more information or a different follow up.

Comment by Alexander Gorrod [ 25/May/18 ]

I have done some investigation into the behavior described and can't see any changes in WiredTiger that have been done since 3.6.4 that are likely to fix this issue. I am running the reproducer and will do more analysis into how we can mitigate the behavior.

Generated at Thu Feb 08 04:38:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.