[SERVER-31180] inMemory spins CPU when cache fills with dirty data Created: 20/Sep/17  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Backlog - Storage Engines Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-29005 Review cache full handling for in-mem... Closed
Assigned Teams:
Storage Engines
Operating System: ALL
Steps To Reproduce:

1. Start Enterprise mongod:
./mongod --dbpath=/tmp --storageEngine=inMemory --replSet rs
2. Run initiate in a shell:
./mongo
rs.initiate()
3. Run a benchrun load to trigger lots of inserts:
3a. clone git@github.com:mongodb/mongo-perf.git
3b.
python benchrun.py --includeFilter 'Insert.JustID' -f ./testcases/insert.js --trialTime 60 --trialCount 1 --writeCmd=true -w 1 -t 24 -s /path/to/mongo/shell/mongo}
4. After about 30 seconds, observe that a few threads in mongod are spinning in eviction. Even after the benchrun completes, the threads continue to spin.

Participants:

 Description   

If you set up a one-node replica set with the inMemory storage engine and continuously write data, eventually you will trigger eviction threads to spin hard at 100% cpu consumption.
Part of this bug is that oldest_timestamp is never updated, so the cache eventually fills with update data. However, whenever the cache fills, the inMemory storage engine should either start using a LAS file, or it should abort. (I don't think it will be logistically possible to block, as it would probably be a deadlock.)



 Comments   
Comment by Alexander Gorrod [ 22/Sep/17 ]

whenever the cache fills, the inMemory storage engine should either start using a LAS file, or it should abort.

Both of those options are difficult with inMemory storage engine.

use the LAS file

The inMemory storage engine doesn't write any content to disk, and can be run on a diskless machine. The lookaside file is the WiredTiger cache overflow mechanism, and involves writing content to disk. i.e: the storage engine will stop being inMemory if we go down this route. Doing this is also quite difficult from an implementation point of view. The lookaside mechanism is paired up with the reconciliation process (i.e: the process of writing content to disk), since inMemory doesn't write content to disk we'd need to make changes to how content makes it into the lookaside file.

it should abort

It's also difficult for the storage engine to determine when to abort. WiredTiger is limited in what it can do by the state of things outside of the storage engine, and so can't know if the application will change something that allows progress to be made. We have a check when running in diagnostic mode that if no progress has been made for 5 minutes the process is aborted, I don't think it's necessarily what end users want. For example if a user has a replica set where a secondary node goes down for 10 minutes, and thus the commit point lags and the cache on the primary becomes full. It's not clear which behavior is preferable to the end user, but I guess that it's generally for the primary to stall until the secondary comes back online - especially with the inMemory storage engine - since bringing a node back online requires a resync.

I feel like I've thrown out a set of problems here without suggesting a solution. If this functionality was user configurable I'd be much happier adding in an "abort if stuck for X minutes" configuration option for WiredTiger - would that be a reasonable path forward?

Generated at Thu Feb 08 04:26:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.