Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.2.10, 3.3.11
Affects Version/s: 3.0.6, 3.1.7, 3.2.5, 3.3.5
Component/s: Performance, WiredTiger
Labels:
- WTplaybook
- code-only

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Issue Status as of Sep 30, 2016

ISSUE SUMMARY
MongoDB with WiredTiger may experience excessive memory fragmentation. This was mainly caused by the difference between the way dirty and clean data is represented in WiredTiger. Dirty data involves smaller allocations (at the size of individual documents and index entries), and in the background that is rewritten into page images (typically 16-32KB). In 3.2.10 and above (and 3.3.11 and above), the WiredTiger storage engine only allows 20% of the cache to become dirty. Eviction works in the background to write dirty data and keep the cache from being filled with small allocations.

That changes in ~~WT-2665~~ and ~~WT-2764~~ limit the overhead from tcmalloc caching and fragmentation to 20% of the cache size (from fragmentation) plus 1GB of cached free memory with default settings.

USER IMPACT
Memory fragmentation caused MongoDB to use more memory than expected, leading to swapping and/or out-of-memory errors.

WORKAROUNDS
Configure a smaller WiredTiger cache than the default.

AFFECTED VERSIONS
MongoDB 3.0.0 to 3.2.9 with WiredTiger.

FIX VERSION
The fix is included in the 3.2.10 production release.

This ticket is a spin-off from ~~SERVER-17456~~, relating to the last issue discussed there.

Under certain workloads a large amount of memory in excess of allocated memory is used. This appears to be due to fragmentation, or some related memory allocation inefficiency. Repro consists of:

mongod running with 10 GB cache (no journal to simplify the situation)
create a 10 GB collection of small documents called "ping", filling the cache
create a second 10 GB collection, "pong", replacing the first in the cache
issue a query to read the first collection "ping" back into the cache, replacing "pong"

Memory stats over the course of the run:

from A-B "ping" is being created, and from C-D "pong" is being created, replacing "ping" in the cache
starting at D "ping" is being read back into the cache, evicting "pong". As "pong" is evicted from cache in principle the memory so freed should be usable for reading "ping" into the cache.
however from D-E we see heap size and central cache free bytes increasing. It appears that for some reason the memory freed by evicting "pong" cannot be used to hold "ping", so it is being returned to the central free list, and instead new memory is being obtained from the OS to hold "ping".
at E, while "ping" is still being read into memory, we see a change in behavior: free memory appears to have been moved from the central free list to the page heap. WT reports number of pages is no longer increasing. I suspect that at this point "ping" has filled the cache and we are successfully recycling memory freed by evicting older "ping" pages to hold newer "ping" pages.
but the net is still about 7 GB of memory in use by the process beyond the 9.5 GB allocated and 9.2 GB in the WT cache, or about a 75% excess.

Theories:

smaller buffers freed by evicting "pong" are discontiguous and cannot hold larger buffers required for reading in "ping"
the buffers freed by evicting "pong" are contiguous, but adjacent buffers are not coalesced by the allocator
buffers are eventually coalesced by the allocator, but not in time to be used for reading in "ping"

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

AggressiveReclaim.png
60 kB
Oct 09 2015 05:49:05 AM UTC
buildInfo.txt
1 kB
Oct 09 2018 03:13:13 PM UTC
buildInfo.txt
1 kB
Oct 09 2018 03:07:29 PM UTC
collStatsLocalOplog.txt
7 kB
Oct 09 2018 03:12:53 PM UTC
collStatsLocalOplog.txt
7 kB
Oct 09 2018 03:08:19 PM UTC
es
25 kB
Jun 08 2016 12:18:00 AM UTC
frag-ex1.png
170 kB
Nov 18 2015 02:23:32 PM UTC
getCmdLineOpts.txt
0.9 kB
Oct 09 2018 03:13:02 PM UTC
getCmdLineOpts.txt
0.9 kB
Oct 09 2018 03:08:03 PM UTC
hostInfo.txt
1 kB
Oct 09 2018 03:07:36 PM UTC
max-heap.png
58 kB
Sep 30 2016 02:29:13 AM UTC
memory-use.png
108 kB
Jun 08 2016 02:15:28 AM UTC
metrics.2016-06-07T21-19-37Z-00000.gz
3.81 MB
Jun 08 2016 12:18:00 AM UTC
MongoDBDataCollectionDec10-mongo42-memory.png
188 kB
Dec 11 2015 04:29:04 PM UTC
NoAggressiveReclaim.png
91 kB
Oct 09 2015 05:49:05 AM UTC
pingpong.png
225 kB
Sep 06 2015 08:23:08 PM UTC
pingpong-decommit.png
157 kB
Oct 09 2015 06:38:48 PM UTC
repro-32.sh
1 kB
Apr 27 2016 06:26:49 PM UTC
repro-32-diagnostic.data-325-detail.png
143 kB
Apr 27 2016 06:26:49 PM UTC
repro-32-diagnostic.data-325-overview.png
123 kB
Apr 27 2016 06:26:49 PM UTC
repro-32-diagnostic.data-335-detail.png
140 kB
Apr 27 2016 06:26:49 PM UTC
repro-32-insert.sh
1 kB
May 06 2016 07:55:51 PM UTC
repro-32-insert-diagnostic.data-326.png
183 kB
May 06 2016 07:55:51 PM UTC
repro-32-insert-diagnostic.data-335.png
181 kB
May 06 2016 07:55:51 PM UTC
rsStatus.txt
2 kB
Oct 09 2018 03:12:44 PM UTC
serverStatus.txt
22 kB
Oct 09 2018 03:07:53 PM UTC

depends on

WT-2551 Make WiredTiger aware of memory allocation overhead and tune cache usage accordingly

Closed

duplicates

SERVER-20104 WT high memory usage due to high amount of free memory accumulated by TCMalloc

Closed

is duplicated by

SERVER-17456 Mongodb 3.0 wiredTiger storage engine memory usage too high.

Closed

SERVER-21837 MongoD memory usage higher than wiredTigerCacheSizeGB for primary in replica set

Closed

SERVER-22482 Cache growing to 100% followed by crash

Closed

is related to

WT-6175 tcmalloc fragmentation is worse in 4.4 with durable history

Closed

related to

SERVER-22906 MongoD uses excessive memory over and above the WiredTiger cache size

Closed

SERVER-23069 Improve tcmalloc freelist statistics

Closed

(1 is related to, 2 related to)

Assignee:: Michael Cahill (Inactive)
Reporter:: Bruce Lucas (Inactive)
Participants:: Alexander Gorrod, Bruce Lucas, Feng Yu, Johnny Shields, Mark Callaghan, Martin Bligh, Michael Cahill, Rakhi Maheshwari, Ramon Fernandez Marina, Tim Hawkins
Votes:: 21 Vote for this issue
Watchers:: 78 Start watching this issue

Created:: Sep 06 2015 08:23:08 PM UTC
Updated:: Nov 16 2021 05:54:34 PM UTC
Resolved:: Sep 30 2016 02:38:05 AM UTC

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates