[SERVER-24949] Lower WiredTiger idle handle timeout to 10 minutes Created: 08/Jul/16  Updated: 21/Nov/23  Resolved: 27/Apr/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Alexander Gorrod Assignee: Louis Williams
Resolution: Done Votes: 0
Labels: 3.7BackgroundTask
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
Related
related to SERVER-47855 Change default value for minSnapshotH... Closed
related to WT-4458 Only sweep btree handles that have be... Closed
is related to SERVER-56661 Evaluate increasing default close_han... Closed
Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2021-05-03
Participants:
Case:
Linked BF Score: 0

 Description   

The MongoDB storage layer currently configures WiredTiger to keep idle collection and index handles open for 28 hours after the last use. We've seen cases where that leads to the WiredTiger handle list growing very large unnecessarily, which can introduce performance problems.

We should consider the consequences of reducing that time closer to the default value of 30 seconds.



 Comments   
Comment by Louis Williams [ 05/May/21 ]

This makes sense to me. I opened SERVER-56661 to evaluate increasing the close_handle_minimum threshold.

Comment by Alexander Gorrod [ 03/May/21 ]

if the issue is the number of file descriptors would it be possible to just close the file descriptors and not evict the data from cache

I don't think that's the issue. Most of the performance issues we've encountered are due to the WiredTiger handle structures.

Another thing that might lessen the impact is to preferentially close the handles with the smallest amount of data in cache.

Yep - we could do that, or stage flushing content from the cache. Both of those changes would require work in WiredTiger, but neither are particularly daunting. aka: I think we can solve this if it's still an issue in the field.

Agree it seems to make sense to increase close_handle_minimum

That's OK with me. I'm not sure what the right number of handles is. ~80 collections with an additional index each seems reasonable to me, but others here have more experience about what sort of distribution of collections are likely in workloads that could be sensitive to this.

Comment by Bruce Lucas (Inactive) [ 01/May/21 ]

Agree it seems to make sense to increase close_handle_minimum.

Another thing that might lessen the impact is to preferentially close the handles with the smallest amount of data in cache. Not sure how expensive it would be to do this as it maybe requires sorting the handle list, or maybe there is some heuristic way to do it that would be good enough?

However if the issue is the number of file descriptors would it be possible to just close the file descriptors and not evict the data from cache (i.e. keep the btree and/or handle)? I would think aggressively closing file descriptors would have less performance impact than aggressively removing btrees as re-opening file descriptors should be quick.

Comment by Eric Milkie [ 30/Apr/21 ]

I think we should still investigate increasing close_handle_minimum in conjunction with this change, as it has the potential to lessen the undesirable effect of evicting tables that have periodic workloads, but still reduce high numbers of open file handles in general.

Comment by Louis Williams [ 30/Apr/21 ]

It seems like there are two competing interests here:

  • The default high timeout causes WiredTiger to accumulate too many file descriptors in workloads that drop and recreate collections often. This matters if we want to expand the default history window past the current default of 10 seconds.
  • A low timeout unnecessarily evicts data from the cache in workloads that have long idle periods.

The problem here is that we don't really have any insight into which workload is more common. I've discussed this with Alex and we think that it's worth trying out this change to better default support durable history. In the event that this causes problems for customers, we have a way out, either by reverting or manually changing the parameter on a per-customer basis. Keep in mind that we already do this for customers where the default timeout is problematic.

Comment by Daniel Pasette (Inactive) [ 28/Apr/21 ]

If we're going to make this change, I think it would be wise to increase the close_handle_minimum (is that the correct setting?) significantly as well. I don't see what the harm of keeping idle collections in cache if there's no is no other cache pressure. I'm talking about the case where you have lots of collections (or collections with many indexes) and your workload quiesces at night, but then you have to pay to page all the data back in the next day. I agree it's not the most common case, but I do recall support issues for this case, and it seems to me that this change will impact them. If they're on atlas, I don't think there's any way they can tweak a knob to change this behavior.

Comment by Alexander Gorrod [ 28/Apr/21 ]

pasette we have not done any work to make it cheaper to close out handles that hold a lot of pages in cache. On the other hand - getting into that situation I think takes some careful construction:

  • There is a minimum number of handles before sweeps kick in. 250, corresponding to about 120 collections.
  • A collection must be clean before it can be closed out, which means that a checkpoint must have been completed since the last update finished.
  • There must not be any other cache pressure in the system - if there were, then the content associated with this idle tree would have been evicted via the LRU algorithm.

In short an application needs to have a significant number of active collections, but not be generating meaningful cache pressure due to operations on those collections. It then must have a collection that was being actively used (hence content in cache), but went idle for 10 minutes - after exactly 10 minutes passes, the application wants to use the collection again and needs to wait.

Most of the reports associated with SERVER-17907 were associated with dropping collections - the behavior of that has changed since 3.0.

It is possible for users to encounter this behavior, but doesn't seem likely. If we notice it in the field we can review how sweep works and ensure that the blocking period of closing out idle handles isn't too long.

Comment by Louis Williams [ 27/Apr/21 ]

The default WiredTiger idle handle timeout has been lowered to 10 minutes from 27 hours. This may result in performance changes in applications with many collections and workloads where collections are idle for longer than 10 minutes. This parameter is still configurable with the setParameter "wiredTigerFileHandleCloseIdleTime".

Comment by Githook User [ 27/Apr/21 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-24949 Lower WiredTiger idle handle timeout to 10 minutes
Branch: master
https://github.com/mongodb/mongo/commit/7edd66e0a235f420f65eda1d5dc338f30d5fdcd0

Comment by Daniel Pasette (Inactive) [ 22/Apr/21 ]

Yes, i do remember, and you've captured the issue. I believe it tracks back to around this issue (and the linked issues from it): SERVER-17907

Comment by Bruce Lucas (Inactive) [ 22/Apr/21 ]

I think any change like this is likely to produce some very surprising results for some applications. The original choice of just over 24 hours was partly motivated IIRC by some customers who encountered nasty performance surprises when their load that had a strong daily cycle and was largely idle overnight came back on line at 8 AM and suddenly had to rewarm the cache even though their working set fit in cache and was consistent from day to day. pasette I think you were involved - do you recall the cases I'm talking about?

Comment by Eric Milkie [ 22/Apr/21 ]

Because this will result in flushing all cached pages for files that are idle longer than 10 minutes, we should be prepared for some workloads to have performance changes due to this. In particular, very idle databases might see a negative impact on the latency of all read queries.
I wonder if we could eventually add another knob that would be a "minimum" number of file handles to keep open regardless of idleness, in an attempt to keep the cache efficient.

Comment by Alexander Gorrod [ 15/Mar/21 ]

We came across a case where the default chosen here is harmful to applications - in SERVER-47855 we experimented with different values for minSnapshotHistoryWindowInSeconds. Snapshot windows of 13 minutes or longer along with a workload that creates and drops many collections leads to accumulating enough cached data handles that systems run out of available file descriptors.

The reason this happens is that MongoDB doesn't drop collections until they are no longer required for the snapshot history window, extending that window means collections aren't dropped. That in combination with keeping idle handles cached for at least 27 hours means that a lot of active handles can be accumulated in such a workload now.

We should reduce the idle timeout for handles. My recommendation would be to reduce the idle timeout to 10 minutes, the change would look something like:

--- a/src/mongo/db/storage/wiredtiger/wiredtiger_parameters.idl
+++ b/src/mongo/db/storage/wiredtiger/wiredtiger_parameters.idl
@@ -122,7 +122,7 @@ server_parameters:
       set_at: startup
       cpp_vartype: 'std::int32_t'
       cpp_varname: gWiredTigerFileHandleCloseIdleTime
-      default: 100000
+      default: 600
       validator:
         gte: 1

The value was originally set so high because there was a performance test which sat idle for an extended period of time (multiple hours) between phases. A consequence of closing idle handles is that any content is flushed from the cache, so better performance is observed since keeping the handle open across idle periods meant less cache warming is required.

That's not a common pattern, and we have seen issues in MongoDB deployments with many live (though inactive) handles over a number of years now.

Further information about the behavior can be seen in SERVER-47855 and the associated analysis.

Comment by Alexander Gorrod [ 08/Jul/16 ]

The current source code has the following comment:

../mongo/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp
219:        ss << "file_manager=(close_idle_time=100000),";  //~28 hours, will put better fix in 3.1.x

The original change was made in SERVER-18286

Generated at Thu Feb 08 04:07:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.