[SERVER-31081] Too many open file descriptors not handled gracefully Created: 13/Sep/17  Updated: 08/Jan/24

Status: Backlog
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 29
Labels: sa-remove-fv-backlog-22
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-37521 Mongo should stop gracefully on "Too ... Closed
Related
Assigned Teams:
Service Arch
Operating System: ALL
Sprint: Service Arch 2019-11-04, Service Arch 2019-11-18, Service Arch 2019-12-02, Service Arch 2019-12-16, Service Arch 2019-12-30, Service Arch 2020-01-13, Service Arch 2020-01-27, Service Arch 2020-02-10, Service Arch 2020-02-24, Service Arch 2020-03-09, Service Arch 2020-03-23, Service Arch 2020-04-06, Service arch 2020-04-20, Service arch 2020-05-04, Service arch 2020-05-18, Service arch 2020-06-01, Service arch 2020-06-15, Service arch 2020-06-29, Service arch 2020-07-13, Service Arch 2020-07-27, Service Arch 2020-08-10, Service Arch 2020-08-24
Participants:
Case:

 Description   

File descriptors are used for multiple purposes, largely socket connections and WiredTiger tables, but also journal files, ftdc files, log files, and so on. When mongod runs out of file descriptors by hitting the RLIMIT_NOFILES multiple internal failures can occur:

2017-09-13T14:05:25.125-0400 I NETWORK  [thread1] Listener: accept() returns -1 Too many open files
2017-09-13T14:05:25.125-0400 E NETWORK  [thread1] Out of file descriptors. Waiting one second before trying to accept more connections.
...
2017-09-13T14:05:26.000-0400 I -        [ftdc] Assertion: 13538:couldn't open [/proc/12839/stat] Too many open files src/mongo/util/processinfo_linux.cpp 74
2017-09-13T14:05:26.000-0400 W FTDC     [ftdc] Uncaught exception in 'Location13538: couldn't open [/proc/12839/stat] Too many open files' in full-time diagnostic data capture subsystem. Shutting down the full-time diagnostic data capture subsystem.
...
2017-09-13T14:05:30.427-0400 E STORAGE  [conn5] WiredTiger error (24) [1505325930:427737][12839:0x7fc987931700], WT_SESSION.commit_transaction: /ssd/db/./r0/journal: directory-list: opendir: Too many open files

After this occurs the node is either not operational, or is operating in a degraded state, but no failover has occurred. We should either

  • bound use of file descriptors for new connections and new tables by refusing new connections and failing new table creation, or
  • detect this condition reliably and abort to trigger failover

Note: currently new connections are limited to 80% of RLIMIT_NOFILE which only partially mitigates this issue because the use of remaining file descriptors for WT tables is not bounded.



 Comments   
Comment by Githook User [ 31/Oct/17 ]

Author:

{'email': 'adam.martin@10gen.com', 'name': 'ADAM David Alan Martin', 'username': 'adamlsd'}

Message: SERVER-31081 Fix Lint again.
Branch: master
https://github.com/mongodb/mongo/commit/1d560ef0ddf4cb410f8cfe98884ddcaed5f40dfa

Should be attached to SERVER-31061. Typo.

Generated at Thu Feb 08 04:25:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.