[SERVER-41492] Disable WiredTiger cursor caching and introduce more aggressive file handle sweeps in testing Created: 04/Jun/19  Updated: 29/Oct/23  Resolved: 15/Aug/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.2.1, 4.3.1

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-41494 Introduce more aggressive WiredTiger ... Closed
Related
related to SERVER-41020 Tweak or fuzz storage engine tunable ... Closed
is related to SERVER-42011 Create concurrency suites to enable W... Closed
is related to SERVER-41494 Introduce more aggressive WiredTiger ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2
Sprint: Execution Team 2019-08-12, Execution Team 2019-08-26
Participants:

 Description   

The objective is to disable WiredTiger cursor caching by setting this parameter above 0 in some builds.

The default is -100, which enables caching in WiredTiger because it is less than or equal to 0, and also uses MongoDB cursor caching with a size of 100.

Copied from SERVER-41494:

Lower the following WiredTiger file_manager parameters to wiredtiger_open:

  • close_handle_minimum: number of handles open before the file manager will look for handles to close. The WT default is 250. Lower to a more reasonable number like 4.
  • close_idle_time: amount of time in seconds a file handle needs to be idle before attempting to close it. The default is 28 hours. Lower to 10 seconds.
  • close_scan_interval interval in seconds at which to check for files that are inactive and close them. The WT default is 10. Lower to 5 seconds.


 Comments   
Comment by Eric Milkie [ 14/May/21 ]

Hi Chan Lewis,
Please ask such questions in the MongoDB Developer Community Forums here: https://developer.mongodb.com/community/forums/
Thank you!

Comment by Chan Lewis [ 14/May/21 ]

May I ask why close_idle_time default value is 100000s ? It seems to be much less aggressive sweeping in production. If I create many collections and don't access them for a long period of time, is it a good idea to sweep them from memory and make room for other active collections ? Personally I think 28h is too long, and far away from default value (30) in wiredtiger (config_def.c).

 

And why wiredTigerFileHandleCloseIdleTime cannot be changed at runtime ?

Comment by Githook User [ 27/Aug/19 ]

Author:

{'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}

Message: SERVER-41492 Create concurrency suites to disable WiredTiger cursor caching
and enable more aggressive file handle sweeps

(cherry picked from commit 407bf9be594278ac505fb089ec17dfef62ac0e25)

Add tunable parameters for WiredTiger file manager options

(cherry picked from commit 70a987f5efd85c3162823e8a07f49566b10d2020)

Add configurable WiredTiger close_scan_interval setting

(cherry picked from commit 0359ce62e90df692b4fb0a2bf68755a6988d9ede)

Fix yaml lint

(cherry picked from commit 4a1d30014c12bae57caf423b695378fdc6fea6c9)
Branch: v4.2
https://github.com/mongodb/mongo/commit/539368137060c637f5e373a341a1d09813c1d403

Comment by Githook User [ 16/Aug/19 ]

Author:

{'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}

Message: SERVER-41492 Fix yaml lint
Branch: master
https://github.com/mongodb/mongo/commit/4a1d30014c12bae57caf423b695378fdc6fea6c9

Comment by Louis Williams [ 15/Aug/19 ]

Requesting a backport to 4.2 to expand our test coverage.

Comment by Githook User [ 15/Aug/19 ]

Author:

{'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}

Message: SERVER-41492 Create concurrency suites to disable WiredTiger cursor caching
and enable more agressive file handle sweeps
Branch: master
https://github.com/mongodb/mongo/commit/407bf9be594278ac505fb089ec17dfef62ac0e25

Comment by Githook User [ 15/Aug/19 ]

Author:

{'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com', 'name': 'Louis Williams'}

Message: SERVER-41492 Add configurable WiredTiger close_scan_interval setting
Branch: master
https://github.com/mongodb/mongo/commit/0359ce62e90df692b4fb0a2bf68755a6988d9ede

Comment by Githook User [ 08/Aug/19 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-41492 Add tunable parameters for WiredTiger file manager options
Branch: master
https://github.com/mongodb/mongo/commit/70a987f5efd85c3162823e8a07f49566b10d2020

Comment by Maria van Keulen [ 02/Aug/19 ]

This patch should also include the configurations described in SERVER-41494, since simultaneously enabling them provides the coverage we're most interested in.

Comment by Maria van Keulen [ 01/Aug/19 ]

It seems like the jstestfuzz_concurrent and jstestfuzz_concurrent_replication tasks make good templates for these new configurations. I suggest we add them to the enterprise rhel 6.2 and enterprise rhel 6.2 (majority read concern off) builders to start.

Comment by Maria van Keulen [ 01/Aug/19 ]

I think introducing jstestfuzz tasks with the parameters configured as described in SERVER-41492 (and similarly SERVER-41494 and SERVER-42011) provides a good balance between adding test coverage and not incurring too much extra maintenance costs. I will spend some time thinking which jstestfuzz tasks make the most sense to use.

Comment by Alexander Gorrod [ 12/Jun/19 ]

milkie we talked about it this morning. I'm concerned that enabling the set of options required to trigger race conditions with cache overflow and handle sweep will alter the behavior of MongoDB and WiredTiger enough to mask other potential failure modes.

We assigned this to the executions backlog for now - since it's about figuring out where to do the testing which has long term maintenance costs on the executions team. Similarly for SERVER-41494

Comment by Eric Milkie [ 07/Jun/19 ]

I'm not sure why someone would first try to reproduce a BF failure from a debug builder with code compiled with nodebug; the code between debug and nondebug already seriously diverges in several significant ways, so it's already best practice to reproduce with the same compiler flags as the failure. Adding a new variant does incur some extra continuing costs and I'm not sure the benefit would be worth it.

Comment by Alexander Gorrod [ 07/Jun/19 ]

Disabling WiredTiger cursor caching in all MongoDB debug builds concerns me - it's a potentially significant implicit divergence between debug/non-debug builds and I don't think it would be obvious to figure out that you can't reproduce a bug because cursor caching is disabled.

Could we instead add a new variant to Evergreen that tests a debug build with cursor caching disabled?

Generated at Thu Feb 08 04:57:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.