[SERVER-40625] Open File Descriptor Regression Created: 12/Apr/19  Updated: 27/Oct/23  Resolved: 26/Sep/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.10, 3.6.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jackie Chu Assignee: Dmitry Agranat
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 3.4.png     PNG File 3.6.png     PNG File CacheUsage.png     PNG File LatencyTest.png     PNG File OpenFD.png     PNG File Screen Shot 2019-04-12 at 11.21.01 AM.png     PNG File Screen Shot 2019-04-19 at 4.51.20 PM.png     PNG File Screen Shot 2019-04-19 at 4.51.50 PM.png     PNG File Screen Shot 2019-04-19 at 4.51.50 PM.png     PNG File Screen Shot 2019-04-30 at 4.55.25 PM.png    
Issue Links:
Duplicate
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

~200 DBs in each mongodb cluster

For each database, there are ~40 collections, performing mix all ops below:

  • query
  • update
  • delete
  • insert
  • drop collections
  • create collections
Participants:

 Description   

Comparing v3.4.17 and v3.6.11

As our team is planning to upgrade mongodb from v3.4 to v3.6, we ran performance test to compare behavior between the two versions. We have 2 mongodb clusters in v3.4.17 and 2 mongodb clusters in v3.6.11, running the same workload across the 4 mongodb clusters. (Our workload is consist of only background jobs, processing same amount of data and operations)

 

Problem

**From our comparisons, we see significant regression in # of open file descriptors. There seems to be a clear trend, rapid leak rate of file descriptor in v3.6.11. (See comparison below - for # open file descriptors on mongodb primaries)

Background

We are paying extra attention to file descriptor because it has been one of the main issues in our mongodb clusters. We had been having slow rate of file descriptor leak in v3.4, which results in memory leak, causing cache to go over 80% after long period of time, and performance degrades over time. With the help from mongodb onsite support, we found that we had file descriptor leak because our workload drop collections and create new collections often, causing the leak in session cache. We requested https://jira.mongodb.org/browse/SERVER-38779 to be backport-ed to v3.6.11. However, even with this fix, we are still seeing significantly higher rate of file descriptor leak in v3.6.11 comparing to v3.4.17 on the same workload. We are suspecting that other bugs are causing file descriptor leak in mongodb v3.6.



 Comments   
Comment by Dmitry Agranat [ 26/Sep/19 ]

Hi,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Dima

Comment by Dmitry Agranat [ 10/Sep/19 ]

Hi chujacky1128

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide:

For both 3.4 and 3.6: diagnostic.data, time (+timezone) for both tests and the exact time (+timezone) when the issue occurred.

Thanks,
Dima

Comment by Dmitry Agranat [ 23/Jul/19 ]

Hi chujacky1128, yes, in order to diagnose this issue, we'll need to review all the requested data. For both 3.4 and 3.6: diagnostic.data, time (+timezone) for both tests and the exact time (+timezone) when the issue occurred.

Comment by Jackie Chu [ 22/Jul/19 ]

Hey Dima,

The diagnostic.data files shared earlier, was captured after 3 days of comparison run.

The leak continued and the cache usage went above 80% after 1 week. This comparison test was done couple months back, so the data is no longer available. If you need to look into the diagnostic.data files for this issue, we will have to restart the test and collect the diagnostic data files after the cache usage exceeded the 80% level.

 

I will probably take us some time to re-setup the test, reproduce the issue and capture the files. I will update this ticket later

Comment by Dmitry Agranat [ 18/Jul/19 ]

Hi renato.riccio,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the requested information?

As for the second issue you've mentioned "only v3.6 goes above 80% cache usage", could you please point me to the exact time/timezone this issue occurs in 3.6 and when this issue, under the same workload, does not occur in 3.4?

Thanks,
Dima

Comment by Dmitry Agranat [ 15/May/19 ]

chujacky1128, thank you for your reply.

After analyzing all the data up until now, we do not think there is a "Open File descriptors regression" between 3.4 and 3.6. Improvements in this area are being tracked under WT-4513. This issue effects both 3.4 and 3.6 versions.

I would have expected that after this fix (SERVER-38779), even if the workload drops & re-create a collection, the expired FD would be purged from cache every 5 mins (wiredTigerSessionCloseIdleTimeSecs), so in theory, table creation & dropping shouldn't cause a leak, unless something else is wrong.

This is our expectation as well. In fact, SERVER-38779 was supposed to help with addressing WT-4513. A parameter wiredTigerSessionCloseIdleTimeSecs was introduced with the session cleanup job. Setting it to a value of 0 will disable the job from doing session cleanup. This should be okay to do on a running system. I suggest we disable the session cleanup and monitor the system performance for a couple of days or until this workload usually hits the issue.

As for the second issue you've mentioned "only v3.6 goes above 80% cache usage", could you please point me to the exact time/timezone this issue occurs in 3.6 and when this issue, under the same workload, does not occur in 3.4?

Thank you,
Dima

Comment by Jackie Chu [ 07/May/19 ]

Thank you for looking into it, Dmitry.

For our workload, if we look at longer time span, the rate of table creation should be the same [in range of days & week]. In short time span, the rate might vary a bit, so I don't think we should focus the analysis on a small period of time [30 mins or so]

Considering our workload is: having 200 DBs per clusters, and all processes are background jobs. The application workers are limited to concurrently process max 15 jobs (15 DBs) for each cluster. [All the jobs and DBs are the same in our synthetic workload]. Inside a job, there are different phases performing different db ops (some phases re-create tables, while some do not), so the rate of table creation fluctuates, depending how many jobs happened to be processing a phase that involves table creation at the given time. Our workload might drop & re-create a collection, but # the total collections in a DB doesn't increase over time.

Problem#1: there is a leak even when the FD cache is periodically cleaned

From what we observe, the FD/memory leak starts to become obvious after constantly running the same heavy workload for over a week. (Same for both 3.4 and 3.6). I would have expected that after this fix (SERVER-38779), even if the workload drops & re-create a collection, the expired FD would be purged from cache every 5 mins (wiredTigerSessionCloseIdleTimeSecs), so in theory, table creation & dropping shouldn't cause a leak, unless something else is wrong.

Problem#2: only v3.6 goes above 80% cache usage

With the leak in both v3.4 & v3.6, after 1 week of heavy workload, only DB in v3.6 would go over 80% cache usage (reaching 95%), which significantly slows down the DB. We kept the workload running for 2 weeks, and the v3.6 DBs further degraded to so slow that they are no longer functional (our simple query latency benchmarking shows 300X slower)

 

I wish I can share the way for you to exercise the same workload, but unfortunately the workload is not a simple script, but our entire real business application & logic (only data is synthetic)

 

Comment by Dmitry Agranat [ 02/May/19 ]

chujacky1128,

I ran a short test by creating 10 collections with 10 indexes between 3.4 and 3.6 and did not find any different in file descriptors. However, I understand that your workload is more complex.

Looking at your tests, it looks like in 3.4, you were creating much less tables (table is either a collection or an index) which might explain the difference in the file descriptors number between the two versions. For example, let's zoom into ~40 minutes of each test, at the same time of the day, and just look at the indexes creation:

3.4 3.6

Also note that it looks like there are long gaps (as long as 30 minutes) between each bulk of table creation in 3.4, which we do not see in 3.6. Once the frequency of these commands stops around 2019-04-15T04:00:22.622Z during the 3.6 test, the files currently open metric is not growing any more and remains flat.

In order to better understand where the symptoms you've mentioned are coming from, will it possible to share with us your workload generator scrip (the more simple it is to execute, the better)?

Comment by Jackie Chu [ 02/May/19 ]

Hi Eric,

All of our dbs (in v3.4 and v3.6) have same collections/indexes/synthetic data shape and they all do not have any TTL index, so this should not be the cause.

Comment by Eric Sedor [ 01/May/19 ]

Thanks for all the information so far, chujacky1128.

One thing we've noticed while comparing metrics is a difference in the number of documents deleted by TTL indexes. Can you confirm the same TTL indexes exist on both clusters, and that the workload with respect to expiring documents is expected to be similar?

Comment by Jackie Chu [ 23/Apr/19 ]

Update

We keep the consistent workload running for over a week (in two mongodb clusters on v3.4.17 & two mongodb clusters on v3.6.11), and we see significantly higher rate of performance degradation on v3.6.11

 

Both v3.4 and v3.6 showed gradual slow down of mongodb performance over time. However, the rate of performance degradation is significantly higher on v3.6. After running the workload on perf env for 7 days:

  1. v3.6.11 - open FD reached 60k per host (comparing to 2.5k per host for v3.4.17)
  2. v3.6.11 - cache usage reached 95% (comparing to below 80% for v3.4.17)
  3. v3.6.11 - latency test showed the queries are as much as 65X slower than the 1st day after server restarted (comparing to 2X slower for v3.4.17)

 

Open FD Comparison

 

Cache Usage Comparison

**It's known that when cache usage is above 80%, mongodb performance would significantly degrade

 

Latency Test Comparison

 

Comment by Jackie Chu [ 20/Apr/19 ]

I have shared the diagnostic.data files to Ben Perlmutter (mongodb sr. solution architect who works with our company).

After running the identical workload on 4 mongodb clusters for ~3 days, the open FD stopped growing and remained at stable level. The 2 mongodb clusters in v3.4 have ~600 open FD each, while the 2 mongodb clusters in v3.6 have ~5000 open FD each. (see below)

Two mongodb primaries in v3.4 (~0.6k open FD each)

Two mongodb primaries in v3.6 (~5k open FD each)**

 

I used lsof to check the open FDs, and found that there are significantly more open FD for the index files in v3.6. Maybe it isn't a FD leak, but in v3.6, mongodb holds on to the open FD for longer for some reason?

 

Due to the nature of the way how we use mongodb (huge # of DBs & collections in each cluster), for all the mongodb hosts, we have been configuring WiredTiger setting below (so that the FD can be cleared more frequently):
● close_handle_minimum=100
● close_idle_time=60
● close_scan_interval=60

 

I am wondering, would there any chance that the WT config aren't enforced properly in v3.6? Or something causing the open FD not being able to be swept (some processes accessed the FD regularly)?

Comment by Eric Sedor [ 16/Apr/19 ]

chujacky1128, can you please archive (tar or zip) the $dbpath/diagnostic.data directory (described here) for both 3.4.17 and 3.6.11, and attach them to this ticket?

Comment by Eric Sedor [ 15/Apr/19 ]

Thanks for your report chujacky1128. We are looking into this.

Generated at Thu Feb 08 04:55:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.