[SERVER-77064] mongod server crashes with "Too many open files" Created: 12/May/23  Updated: 30/Jun/23  Resolved: 30/Jun/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.19, 4.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: JDatta N/A Assignee: Chris Kelly
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Issue summary

If we try to run collStats command on more than 30k collections in a loop, it causes the mongod server process to crash with directory-list: opendir: Too many open files: aborting after fassert() failure

Steps to reproduce

Create a mongodb database, with 30,000 collections in it. We want to find out the collection stats for each of the 30k collections. Following is a simplified script that does it:

let collections = db.getCollectionNames();
// The following creates >64K FDs on MongoDB server
// (mongod) if the number of collections is 30k
// This causes the mongod server process to crash.
//
collections.forEach((collectionName) => {
  print('Collection Name: ' + collectionName);
  let collection = db.getCollection(collectionName);
  let stats = db.runCommand({ collStats: collectionName });
  print('Document Count: ' + stats.count);
});

 
Then I try to run the script using this command:

mongo <mongod_server_host>:<mongod_server_host>/<db_with_30k_collections> -u user -p password -authenticationDatabase admin ./collstats.js

After printing the stats for ~29k collections, this script causes the server mongod process to crash with the following stack trace.

2023-05-12T12:08:07.000+0530 E -        [ftdc] Assertion: Location13538: couldn't open [/proc/9132/stat] Too many open files src/mongo/util/processinfo_linux.cpp 81
2023-05-12T12:08:07.498+0530 E STORAGE  [conn9] WiredTiger error (24) [1683873487:498900][9132:0x7f71de8b2700], file:db30k/collection-87589--8265343348318534937.wt, WT_SESSION.open_cursor: __posix_open_file, 672: /data1/mongodb/var/lib/mongo/db30k/collection-87589--8265343348318534937.wt: handle-open: open: Too many open files Raw: [1683873487:498900][9132:0x7f71de8b2700], file:db30k/collection-87589--8265343348318534937.wt, WT_SESSION.open_cursor: __posix_open_file, 672: /data1/mongodb/var/lib/mongo/db30k/collection-87589--8265343348318534937.wt: handle-open: open: Too many open files
2023-05-12T12:08:08.000+0530 E -        [ftdc] Assertion: Location13538: couldn't open [/proc/9132/stat] Too many open files src/mongo/util/processinfo_linux.cpp 81
2023-05-12T12:08:08.002+0530 E STORAGE  [conn9] WiredTiger error (24) [1683873488:2150][9132:0x7f71de8b2700], file:db30k/index-87590--8265343348318534937.wt, WT_SESSION.open_cursor: __posix_open_file, 672: /data1/mongodb/var/lib/mongo/db30k/index-87590--8265343348318534937.wt: handle-open: open: Too many open files Raw: [1683873488:2150][9132:0x7f71de8b2700], file:db30k/index-87590--8265343348318534937.wt, WT_SESSION.open_cursor: __posix_open_file, 672: /data1/mongodb/var/lib/mongo/db30k/index-87590--8265343348318534937.wt: handle-open: open: Too many open files
2023-05-12T12:08:08.418+0530 E STORAGE  [thread34] WiredTiger error (24) [1683873488:418763][9132:0x7f71ecad0700], log-server: __directory_list_worker, 46: /data1/mongodb/var/lib/mongo/journal: directory-list: opendir: Too many open files Raw: [1683873488:418763][9132:0x7f71ecad0700], log-server: __directory_list_worker, 46: /data1/mongodb/var/lib/mongo/journal: directory-list: opendir: Too many open files
2023-05-12T12:08:08.418+0530 E STORAGE  [thread34] WiredTiger error (24) [1683873488:418949][9132:0x7f71ecad0700], log-server: __log_prealloc_once, 464: log pre-alloc server error: Too many open files Raw: [1683873488:418949][9132:0x7f71ecad0700], log-server: __log_prealloc_once, 464: log pre-alloc server error: Too many open files
2023-05-12T12:08:08.418+0530 E STORAGE  [thread34] WiredTiger error (24) [1683873488:418974][9132:0x7f71ecad0700], log-server: __log_server, 927: log server error: Too many open files Raw: [1683873488:418974][9132:0x7f71ecad0700], log-server: __log_server, 927: log server error: Too many open files
2023-05-12T12:08:08.418+0530 E STORAGE  [thread34] WiredTiger error (-31804) [1683873488:418992][9132:0x7f71ecad0700], log-server: __wt_panic, 494: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1683873488:418992][9132:0x7f71ecad0700], log-server: __wt_panic, 494: the process must exit and restart: WT_PANIC: WiredTiger library panic
2023-05-12T12:08:08.419+0530 F -        [thread34] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 420
2023-05-12T12:08:08.419+0530 F -        [thread34]
 
***aborting after fassert() failure
 
 
2023-05-12T12:08:08.429+0530 F -        [thread34] Got signal: 6 (Aborted).
 0x55cd23605601 0x55cd23604819 0x55cd23604cfd 0x7f71f459b5d0 0x7f71f41f5207 0x7f71f41f68f8 0x55cd21af1831 0x55cd21e6cb46 0x55cd21ed7611 0x55cd21a7c565 0x55cd21a7c97f 0x55cd21f441be 0x7f71f4593dd5 0x7f71f42bcead
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55CD210A0000","o":"2565601","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55CD210A0000","o":"2564819"},{"b":"55CD210A0000","o":"2564CFD"},{"b":"7F71F458C000","o":"F5D0"},{"b":"7F71F41BF000","o":"36207","s":"gsignal"},{"b":"7F71F41BF000","o":"378F8","s":"abort"},{"b":"55CD210A0000","o":"A51831","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"55CD210A0000","o":"DCCB46"},{"b":"55CD210A0000","o":"E37611"},{"b":"55CD210A0000","o":"9DC565","s":"__wt_err_func"},{"b":"55CD210A0000","o":"9DC97F","s":"__wt_panic"},{"b":"55CD210A0000","o":"EA41BE"},{"b":"7F71F458C000","o":"7DD5"},{"b":"7F71F41BF000","o":"FDEAD","s":"clone"}],"processInfo":{ "mongodbVersion" : "4.0.19", "gitVersion" : "7e28f4296a04d858a2e3dd84a1e79c9ba59a9568", "compiledModules" : [ "enterprise" ], "uname" : { "sysname" : "Linux", "release" : "3.10.0-957.el7.x86_64", "version" : "#1 SMP Thu Nov 8 23:39:32 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "b" : "55CD210A0000", "elfType" : 3, "buildId" : "063EA082F67507C179457BCF93E4D8E2FAC4D554" }, { "b" : "7FFD13ECA000", "elfType" : 3, "buildId" : "163C2DC43405427478788BAD0AFD537A7ACF7A13" }, { "b" : "7F71F7574000", "path" : "/lib64/libldap-2.4.so.2", "elfType" : 3, "buildId" : "F1FADDDE0D21D5F4E2DCADEDD3B85B6E7AAC9883" }, { "b" : "7F71F7365000", "path" : "/lib64/liblber-2.4.so.2", "elfType" : 3, "buildId" : "3192C56CD451E18EB9F29CB045432BA9C738DD29" }, { "b" : "7F71F6EAC000", "path" : "/lib64/libnetsnmpmibs.so.31", "elfType" : 3, "buildId" : "AB0136CD79E37A5F98EF9F47DF751C19E58BA428" }, { "b" : "7F71F6C9D000", "path" : "/lib64/libsensors.so.4", "elfType" : 3, "buildId" : "A2ACE3E193F25778AA87D2E221945FDCCFCF220F" }, { "b" : "7F71F6A99000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "67AD3498AC7DE3EAB952A243094DF5C12A21CD7D" }, { "b" : "7F71F6831000", "path" : "/lib64/librpm.so.3", "elfType" : 3, "buildId" : "BA9B5DF3BBE4764C0BA2AD6B9BBD5E10512BC082" }, { "b" : "7F71F6604000", "path" : "/lib64/librpmio.so.3", "elfType" : 3, "buildId" : "069D6EB6C16272B4A4428CAC4087799F18318AA5" }, { "b" : "7F71F6395000", "path" : "/lib64/libnetsnmpagent.so.31", "elfType" : 3, "buildId" : "17AAF1F3742309CEA30A295F7C8CE39CA930B503" }, { "b" : "7F71F618A000", "path" : "/lib64/libwrap.so.0", "elfType" : 3, "buildId" : "8C4AA46577D3AA7EBF8338BDFAECC6586EF29906" }, { "b" : "7F71F5E87000", "path" : "/lib64/libnetsnmp.so.31", "elfType" : 3, "buildId" : "25F113859E76E8CEAA47298D6DFD611FA8E53FF5" }, { "b" : "7F71F5C15000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "AEF5E6F2240B55F90E9DF76CFBB8B9D9F5286583" }, { "b" : "7F71F57B4000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "8BD89856B64DD5189BF075EF574EDF203F93D44A" }, { "b" : "7F71F5597000", "path" : "/lib64/libsasl2.so.3", "elfType" : 3, "buildId" : "E2F2017F821DD1B9D307DA1A9B8014F2941AEB7B" }, { "b" : "7F71F534A000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "C3914975092B29D330453950350E254AA562D642" }, { "b" : "7F71F50E1000", "path" : "/lib64/libcurl.so.4", "elfType" : 3, "buildId" : "9114859D3C4BEC47A03CA321EE367DCA799638CD" }, { "b" : "7F71F4DDF000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "918D3696BF321AA8D32950AB2AB8D0F1B21AC907" }, { "b" : "7F71F4BC6000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "4C488F6E7044BB966162C1F7081ABBA6EBB2B485" }, { "b" : "7F71F49BE000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "EFDE2029C9A4A20BE5B8D8AE7E6551FF9B5755D2" }, { "b" : "7F71F47A8000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "6B4F3D896CD0F06FCB3DEF0245F204ECE3220D7E" }, { "b" : "7F71F458C000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "3D9441083D079DC2977F1BD50C8068D11767232D" }, { "b" : "7F71F41BF000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "3C61131D1DAC9DA79B73188E7702BEF786C2AD54" }, { "b" : "7F71F77C9000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5DA2D47925497B2F5875A7D8D1799A1227E2FDE4" }, { "b" : "7F71F3F6D000", "path" : "/lib64/libssl3.so", "elfType" : 3, "buildId" : "DE2FC96025F4F43AA6E48523D354581D2C5E0669" }, { "b" : "7F71F3D46000", "path" : "/lib64/libsmime3.so", "elfType" : 3, "buildId" : "9757EB6D93E31024C436919A89BEF33CA4F1B984" }, { "b" : "7F71F3A19000", "path" : "/lib64/libnss3.so", "elfType" : 3, "buildId" : "289D245DC413161931C170A1DFB9676582C84DBF" }, { "b" : "7F71F37EA000", "path" : "/lib64/libnssutil3.so", "elfType" : 3, "buildId" : "B7F45CBBFB31A1390DA3CD65494FB4824046B22C" }, { "b" : "7F71F35E6000", "path" : "/lib64/libplds4.so", "elfType" : 3, "buildId" : "084D2194302908913F68B9DCD27DE46FA5B50522" }, { "b" : "7F71F33E1000", "path" : "/lib64/libplc4.so", "elfType" : 3, "buildId" : "799B28AD9A5460D78376E2C11260F2E858B95DE3" }, { "b" : "7F71F31A3000", "path" : "/lib64/libnspr4.so", "elfType" : 3, "buildId" : "DE762A281741
10911B273E175D54F222B313CFE0" }, { "b" : "7F71F2E15000", "path" : "/usr/lib64/perl5/CORE/libperl.so", "elfType" : 3, "buildId" : "E2C3C10A756404CC8888CD6CA8DFAD26064EF3CB" }, { "b" : "7F71F2BFB000", "path" : "/lib64/libnsl.so.1", "elfType" : 3, "buildId" : "B1DC715EF11A8A
CF2176AE6D34914328F9DBE9CC" }, { "b" : "7F71F29C4000", "path" : "/lib64/libcrypt.so.1", "elfType" : 3, "buildId" : "740CAD898E29E1F3B73A323CCEC4A7C88911647F" }, { "b" : "7F71F27C1000", "path" : "/lib64/libutil.so.1", "elfType" : 3, "buildId" : "BFB22A176189006BBF996A7664C
CC6BDB2EB9BD9" }, { "b" : "7F71F25B1000", "path" : "/lib64/libbz2.so.1", "elfType" : 3, "buildId" : "0C85C0386F0CF41EA39969CF7F58A558D1AD3235" }, { "b" : "7F71F239B000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "B9D5F73428BD6AD68C96986B57BEA3B7CEDB9745" },
{ "b" : "7F71F2183000", "path" : "/lib64/libelf.so.1", "elfType" : 3, "buildId" : "A22999760AD7CF97A32E5A78C34238D56034F9DD" }, { "b" : "7F71F1F5D000", "path" : "/lib64/liblzma.so.5", "elfType" : 3, "buildId" : "3B2C97C1937B73A69C412A96D0810C43DF0C6F54" }, { "b" : "7F71F1
D53000", "path" : "/lib64/libpopt.so.0", "elfType" : 3, "buildId" : "7AE00165FBAF6920DD5AED6905820DDBAB589E84" }, { "b" : "7F71F1B2C000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "D2DD4DA3FDE1477D25BFFF80F3A25FDB541A8179" }, { "b" : "7F71F1927000", "p
ath" : "/lib64/libcap.so.2", "elfType" : 3, "buildId" : "D594598CEBC1D6DC4ECE2ABF4C29398FDB00587E" }, { "b" : "7F71F171E000", "path" : "/lib64/libacl.so.1", "elfType" : 3, "buildId" : "7F39882FC0B80BE53790C2EAC307D39F7DE1AD6E" }, { "b" : "7F71F14F0000", "path" : "/lib64/l
iblua-5.1.so", "elfType" : 3, "buildId" : "BDD4B9CFC1D3F31D3A5A430D2F9080E020C5B0BA" }, { "b" : "7F71F1131000", "path" : "/lib64/libdb-5.3.so", "elfType" : 3, "buildId" : "746F78257330663AD584998981E6C4D0BF49C940" }, { "b" : "7F71F0F08000", "path" : "/lib64/libaudit.so.1"
, "elfType" : 3, "buildId" : "2E36E1B9A2D92C969E38CDDCC729F55D8BACBB2B" }, { "b" : "7F71F0C1F000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "872EAC9F0CE30D5C4C37ECCCE3C586296D4FA1F0" }, { "b" : "7F71F0A1B000", "path" : "/lib64/libcom_err.so.2", "elfType"
 : 3, "buildId" : "B4BE1023D9606A88169DF411BF94AF417D7BA1A0" }, { "b" : "7F71F07E8000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "7856E751772E8538A33113BA62145A8B23314093" }, { "b" : "7F71F05D8000", "path" : "/lib64/libkrb5support.so.0", "elfType" :
3, "buildId" : "AED31F16223CE52AE079AB1ED4C09AC4C98F86B8" }, { "b" : "7F71F03D4000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F71F01A1000", "path" : "/lib64/libidn.so.11", "elfType" : 3, "buildI
d" : "2B77BBEFFF65E94F3E0B71A4E89BEB68C4B476C5" }, { "b" : "7F71EFF77000", "path" : "/lib64/libssh2.so.1", "elfType" : 3, "buildId" : "39E1832C767A9167A23569784270DD4B5FD6DC91" }, { "b" : "7F71EFD74000", "path" : "/lib64/libfreebl3.so", "elfType" : 3, "buildId" : "B758881
F4B6AF6C28C07A1A57713CBD2144628D4" }, { "b" : "7F71EFB12000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "9CA3D11F018BEEB719CDB34BE800BF1641350D0A" }, { "b" : "7F71EF90D000", "path" : "/lib64/libattr.so.1", "elfType" : 3, "buildId" : "2617ECC6738047E207AE3
ADD990BD6A34D11B265" }, { "b" : "7F71EF707000", "path" : "/lib64/libcap-ng.so.0", "elfType" : 3, "buildId" : "43578677DF613E9D58128ED4AE0C344FBC1E44C0" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55cd23605601]
 mongod(+0x2564819) [0x55cd23604819]
 mongod(+0x2564CFD) [0x55cd23604cfd]
 libpthread.so.0(+0xF5D0) [0x7f71f459b5d0]
 libc.so.6(gsignal+0x37) [0x7f71f41f5207]
 libc.so.6(abort+0x148) [0x7f71f41f68f8]
 mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x55cd21af1831]
 mongod(+0xDCCB46) [0x55cd21e6cb46]
 mongod(+0xE37611) [0x55cd21ed7611]
 mongod(__wt_err_func+0x90) [0x55cd21a7c565]
 mongod(__wt_panic+0x39) [0x55cd21a7c97f]
 mongod(+0xEA41BE) [0x55cd21f441be]
 libpthread.so.0(+0x7DD5) [0x7f71f4593dd5]
 libc.so.6(clone+0x6D) [0x7f71f42bcead]
-----  END BACKTRACE  -----

MongoDB version is: 4.0.19. But I believe this would be reproducible in other versions as well. Let me know if it is fixed in any specific version onwards.

MongoDB Enterprise node1:SECONDARY> version()
4.0.19



 Comments   
Comment by Chris Kelly [ 26/May/23 ]

Hi mail.joydip+github@gmail.com,

Thanks for the extra context on this. While this still isn't a recommended number of collections to have on the server (for a number of reasons), you may possibly be running into a byproduct of the gWiredTigerFileHandleCloseIdleTime setting, which defines the minimum time the underlying WiredTiger storage engine will wait before closing an inactive file.

The reason this may be occurring for you on 4.4 and earlier is because the setting is normally:

This isn't a documented parameter, but you can try overriding it given the above guidance, or upgrade to 5.0+ to see if the issue persists. That said, if you open enough files to hit the dhandle limit within that time period, you may still have issues, and may want to adjust the setting further if this is truly needed (with the understanding this is not a recommended setup to begin with)

If the issue is still persisting after considering the above, additional data would be helpful. If you get it to reproduce on newer versions, could you please provide some more data from your run? (It'd be appreciated)

For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to the ticket:

  • the mongod logs
  • the $dbpath/diagnostic.data directory (the contents are described here)

Christopher

Comment by JDatta N/A [ 25/May/23 ]

Hi chris.kelly@mongodb.com ,

I have reproduced the problem in MongoDB version 4.4.6 with the exact same steps. So this is not due to an EOL version.

Also, running two mongod nodes in the same system should not relate to this problem at all. The limit on the number of open files is per process; we are not hitting any global limits. In fact, if I change the ulimit for the process to 500k, the test case does not cause any crashes. We are not running the test against both.

The problem is not so much about having this many collections or opening this many FDs. The problem here is, the open files are NEVER being closed by the mongod server. Even days after closing the client.

In one experiment, I ran a few thousand collStats and that opened ~4000 files. I closed the client process and came back after a DAY!!!

To me, this looks like a valid issue. Would it be possible to reopen the JIRA and prioritize this?

Comment by Chris Kelly [ 23/May/23 ]

Hi mail.joydip+github@gmail.com,

MongoDB 4.0 reached end of life in April 2022 and is no longer supported.

Additionally, as a side comment, it looks like you're running two mongod nodes on the same system you're doing this test on (which each open a file descriptor on 4.0), so it would sound reasonable to be hitting this limit if you are running a test against both, or have ran it on both at one point in time recently (as it would open a file descriptor, which may not be closed immediately), or both have a huge number of collections that are actively being used. Once upgrading to 4.2 or newer, this can be expected to use 2 file descriptors per collection.

It's not recommended to have this many collections to begin with - you can check out our blog post on this for more info. If you come across issues with our official recommendations on a supported version, we'd be happy to revisit this issue again.

For more discussion on this in the meantime, I'd recommend checking with our community for help by posting on the MongoDB Developer Community Forums.

Thanks for your report!

Christopher

Comment by JDatta N/A [ 12/May/23 ]

Please also note that, before performing the test, we set the ulimit of the mongod server to 64k as recommended by MongoDB docs.

# ulimit -n -H
64000
# ulimit -n -S
64000
# ps -aef|grep mongod
root     11199     1 16 16:05 ?        00:01:42 mongod --config /data1/mongodb/mongod.conf
root     11291     1 16 16:09 ?        00:01:04 mongod --config /data2/mongodb/mongod.conf
root     11450  6022  0 16:15 pts/1    00:00:00 grep --color=auto mongod
# cat /proc/11199/limits |grep 'Max open files'
Max open files            64000                64000                files
#

Generated at Thu Feb 08 06:34:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.