Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-103432

FTDC Thread Enters Uninterruptible Sleep (D State) Due to Kernel-Level autofs issue on RHEL

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • ALL
    • None
    • 0
    • None
    • None
    • None
    • None
    • None
    • None

      In environments running MongoDB on RHEL-based systems, the ftdc thread responsible for collecting diagnostic metrics may enter an uninterruptible sleep (D state). This occurs due to a known kernel-level bug in autofs, where a closed file descriptor is mishandled during automount routines, causing the thread to hang indefinitely in autofs_mount_wait. This renders FTDC inoperative and impacts observability.

      Technical Analysis:{}

      Thread Details:{}

      • Affected Thread: ftdc (ID: 923934)
      • State: D (Blocked I/O)
      • Wait Channel: autofs_mount_wait
      • Stack Trace:
      $ cat /proc/923911/task/923934/stack
      [<0>] autofs_wait+0x319/0x79d
      [<0>] autofs_mount_wait+0x49/0xf0
      [<0>] autofs_d_automount+0xe6/0x1f0
      [<0>] follow_managed+0x17f/0x2e0
      [<0>] lookup_fast+0x135/0x2a0
      [<0>] walk_component+0x48/0x300
      [<0>] path_lookupat.isra.43+0x79/0x220
      [<0>] filename_lookup.part.58+0xa0/0x170
      [<0>] user_statfs+0x43/0xa0
      [<0>] __do_sys_statfs+0x20/0x60
      [<0>] do_syscall_64+0x5b/0x1b0
      [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 

      Root Cause:{}

      • A known bug in autofs (RHEL Bugzilla #2023740, Red Hat Case ID available)
      • Improper error handling during delayed automount logic.
      • Results in EBADF or EINVAL, blocking the kernel thread waiting on autofs_mount_wait.

      Impact:{}

      • FTDC is unable to collect metrics, affecting monitoring tools.
      • Can result in performance degradation or hangs if other threads also block.
      • Troubleshooting is hindered by lack of observability.

      Proposed Solutions:{}

      Solution A: Add FTDC Watchdog and Self-Restart Mechanism{}

      Description:

      Introduce a lightweight watchdog thread inside the ftdc subsystem to monitor the health/status of FTDC’s main thread. If a D state is detected beyond a threshold (e.g., 5 minutes), gracefully stop and restart FTDC collection.

      Solution B: Implement SERVER-103431

       

            Assignee:
            Unassigned Unassigned
            Reporter:
            vgrippa@gmail.com Vinicius Grippa
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: