Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13681

MongoDB stalls during background flush on Windows

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.6.4, 2.7.5
    • Component/s: Storage
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      Windows
    • Backport Completed:
    • Epic Link:
    • Sprint:
      Server 2.7.2, Server 2.7.5

      Description

      Issue Status as of Aug 4, 2014

      ISSUE SUMMARY
      For MongoDB instances running on Windows, MongoDB takes a global mutex that blocks all requests during the background flush of database files to disk.

      USER IMPACT
      Database reads, and writes block during background flushes. Users will see long request times while requests wait for the background database flush to finish.

      WORKAROUNDS
      N/A

      AFFECTED VERSIONS
      All versions of MongoDB prior to 2.6.4 are affected by this issue.

      FIX VERSION
      The fix is included in the 2.6.4 production release.

      RESOLUTION DETAILS
      A unnecessary lock was removed from the product.

      Original description

      MongoDB has low CPU usage and does not process requests while a background flush is proceeding.

      The cause of this has been identified as the following blocking chain:

      T1: Generic Query Thread: Holds: Nothing, Acquires DBLock(R), Waits on T2
      T2: WRITEDATAFILES Thread: Holds DBLock(W). Acquires: GlobalFlushMutex, Waits T3
      T3: Flush: Holds Global Flush Mutex, Waits: I/O

      The lock was originally added in SERVER-7378 to workaround a bug in the Windows Azure Storage driver . It has been confirmed by Microsoft that there is a bug in the driver that only affects (a) memory mapped files such as MongoDB databases which are (b) concurrently updated while flushing to (c) a drive that is hosted on a Azure disk that does not set host cache preference to read/write.

      We have removed the SERVER-7378 workaround since it penalizes all Windows deployment scenarios, including all scenarios where there is no bug (like bare-metal, other cloud providers, etc).

        Attachments

        1. profile2.etl
          6.38 MB
        2. log.2014-07-18T17-42-40
          7 kB
        3. log
          9 kB

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: