Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22906

MongoD uses excessive memory over and above the WiredTiger cache size



    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Duplicate
    • Affects Version/s: 3.2.1
    • Fix Version/s: None
    • Component/s: WiredTiger
    • Labels:
    • Operating System:
    • Sprint:
      Integration 11 (03/14/16)
    • Linked BF Score:


      Issue Status as of Sep 30, 2016

      MongoDB with WiredTiger may experience excessive memory fragmentation. This was mainly caused by the difference between the way dirty and clean data is represented in WiredTiger. Dirty data involves smaller allocations (at the size of individual documents and index entries), and in the background that is rewritten into page images (typically 16-32KB). In 3.2.10 and above (and 3.3.11 and above), the WiredTiger storage engine only allows 20% of the cache to become dirty. Eviction works in the background to write dirty data and keep the cache from being filled with small allocations.

      That changes in WT-2665 and WT-2764 limit the overhead from tcmalloc caching and fragmentation to 20% of the cache size (from fragmentation) plus 1GB of cached free memory with default settings.

      Memory fragmentation caused MongoDB to use more memory than expected, leading to swapping and/or out-of-memory errors.

      Configure a smaller WiredTiger cache than the default.

      MongoDB 3.0.0 to 3.2.9 with WiredTiger.

      The fix is included in the 3.2.10 production release.

      Numerous reports of mongod using excessive memory. This has been traced back to a combination of factors:

      1. TCMalloc does not free from page heap
      2. Fragmentation of spans due to varying allocation sizes
      3. Current cache size limits enforce net of memory use, not including allocator overhead, which is often significantly less than total memory used. This is surprising and difficult for users to tune for appropriately

      Issue #1 has a workaround by setting an environment variable (AGGRESSIVE_DECOMMIT), but may have a performance impact. Further investigation ongoing.
      Issue #2 has fixes in place in v3.3.5.
      Issue #3 will likely be addressed by making the WiredTiger engine aware of memory allocation overhead, and tuning cache usage accordingly. (Need reference to WT ticket)

      Regression tests for memory usage are being tracked here: SERVER-23333

      Original Description
      While loading data into mongo, each of the 3 primaries crashed with memory allocation issues. As data keeps loading, new primaries are elected. Eventually it looks like they come down as well. Some nodes have recovered and have come back up, but new ones keep coming down. Logs and diagnostic attached


        1. diagnostic.data.tgz
          19.80 MB
          Matthew Clark
        2. diagnostic.data-326.png
          158 kB
          Bruce Lucas
        3. diagnostic.data-335.png
          166 kB
          Bruce Lucas
        4. fragmentation.png
          286 kB
          Bruce Lucas
        5. fragmentation-repro.png
          79 kB
          Bruce Lucas
        6. fragmentation-repro-aggressive-decommit.png
          143 kB
          Bruce Lucas
        7. metrics.2016-02-29T09-36-53Z-00000.gz
          9.93 MB
          James Mangold
        8. metrics.2016-02-29T21-22-32Z-00000.gz
          8.61 MB
          James Mangold
        9. metrics.2016-03-01T06-54-27Z-00000.gz
          4.17 MB
          James Mangold
        10. mongodb.log.2016-03-01T06-51-04.gz
          95.01 MB
          James Mangold
        11. Screen Shot 2016-05-05 at 10.44.34 AM.png
          123 kB
          Matthew Clark
        12. tcmalloc_aggressive_decommit.png
          152 kB
          Christian Bayer

          Issue Links



              9 Vote for this issue
              34 Start watching this issue