Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31081

Too many open file descriptors not handled gracefully

    • Service Arch
    • ALL
    • Service Arch 2019-11-04, Service Arch 2019-11-18, Service Arch 2019-12-02, Service Arch 2019-12-16, Service Arch 2019-12-30, Service Arch 2020-01-13, Service Arch 2020-01-27, Service Arch 2020-02-10, Service Arch 2020-02-24, Service Arch 2020-03-09, Service Arch 2020-03-23, Service Arch 2020-04-06, Service arch 2020-04-20, Service arch 2020-05-04, Service arch 2020-05-18, Service arch 2020-06-01, Service arch 2020-06-15, Service arch 2020-06-29, Service arch 2020-07-13, Service Arch 2020-07-27, Service Arch 2020-08-10, Service Arch 2020-08-24

      File descriptors are used for multiple purposes, largely socket connections and WiredTiger tables, but also journal files, ftdc files, log files, and so on. When mongod runs out of file descriptors by hitting the RLIMIT_NOFILES multiple internal failures can occur:

      2017-09-13T14:05:25.125-0400 I NETWORK  [thread1] Listener: accept() returns -1 Too many open files
      2017-09-13T14:05:25.125-0400 E NETWORK  [thread1] Out of file descriptors. Waiting one second before trying to accept more connections.
      ...
      2017-09-13T14:05:26.000-0400 I -        [ftdc] Assertion: 13538:couldn't open [/proc/12839/stat] Too many open files src/mongo/util/processinfo_linux.cpp 74
      2017-09-13T14:05:26.000-0400 W FTDC     [ftdc] Uncaught exception in 'Location13538: couldn't open [/proc/12839/stat] Too many open files' in full-time diagnostic data capture subsystem. Shutting down the full-time diagnostic data capture subsystem.
      ...
      2017-09-13T14:05:30.427-0400 E STORAGE  [conn5] WiredTiger error (24) [1505325930:427737][12839:0x7fc987931700], WT_SESSION.commit_transaction: /ssd/db/./r0/journal: directory-list: opendir: Too many open files
      

      After this occurs the node is either not operational, or is operating in a degraded state, but no failover has occurred. We should either

      • bound use of file descriptors for new connections and new tables by refusing new connections and failing new table creation, or
      • detect this condition reliably and abort to trigger failover

      Note: currently new connections are limited to 80% of RLIMIT_NOFILE which only partially mitigates this issue because the use of remaining file descriptors for WT tables is not bounded.

            Assignee:
            backlog-server-servicearch [DO NOT USE] Backlog - Service Architecture
            Reporter:
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            Votes:
            29 Vote for this issue
            Watchers:
            35 Start watching this issue

              Created:
              Updated: