[SERVER-22906] MongoD uses excessive memory over and above the WiredTiger cache size Created: 01/Mar/16  Updated: 30/Sep/16  Resolved: 29/Sep/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: James Mangold Assignee: Alexander Gorrod
Resolution: Duplicate Votes: 9
Labels: WTmem, WTplaybook
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2016-05-05 at 10.44.34 AM.png     PNG File diagnostic.data-326.png     PNG File diagnostic.data-335.png     File diagnostic.data.tgz     PNG File fragmentation-repro-aggressive-decommit.png     PNG File fragmentation-repro.png     PNG File fragmentation.png     File metrics.2016-02-29T09-36-53Z-00000.gz     File metrics.2016-02-29T21-22-32Z-00000.gz     File metrics.2016-03-01T06-54-27Z-00000.gz     File mongodb.log.2016-03-01T06-51-04.gz     PNG File tcmalloc_aggressive_decommit.png    
Issue Links:
Depends
depends on SERVER-23333 Regression test for allocator fragmen... Closed
Duplicate
duplicates WT-2764 Optimize checkpoints to reduce throug... Closed
is duplicated by SERVER-26312 Multiple high memory usage alerts, Mo... Closed
Related
related to SERVER-23069 Improve tcmalloc freelist statistics Closed
related to SERVER-24303 Enable tcmalloc aggressive decommit b... Closed
is related to SERVER-20306 75% excess memory usage under WiredTi... Closed
Operating System: ALL
Sprint: Integration 11 (03/14/16)
Participants:
Linked BF Score: 0

 Description   
Issue Status as of Sep 30, 2016

ISSUE SUMMARY
MongoDB with WiredTiger may experience excessive memory fragmentation. This was mainly caused by the difference between the way dirty and clean data is represented in WiredTiger. Dirty data involves smaller allocations (at the size of individual documents and index entries), and in the background that is rewritten into page images (typically 16-32KB). In 3.2.10 and above (and 3.3.11 and above), the WiredTiger storage engine only allows 20% of the cache to become dirty. Eviction works in the background to write dirty data and keep the cache from being filled with small allocations.

That changes in WT-2665 and WT-2764 limit the overhead from tcmalloc caching and fragmentation to 20% of the cache size (from fragmentation) plus 1GB of cached free memory with default settings.

USER IMPACT
Memory fragmentation caused MongoDB to use more memory than expected, leading to swapping and/or out-of-memory errors.

WORKAROUNDS
Configure a smaller WiredTiger cache than the default.

AFFECTED VERSIONS
MongoDB 3.0.0 to 3.2.9 with WiredTiger.

FIX VERSION
The fix is included in the 3.2.10 production release.

Numerous reports of mongod using excessive memory. This has been traced back to a combination of factors:

  1. TCMalloc does not free from page heap
  2. Fragmentation of spans due to varying allocation sizes
  3. Current cache size limits enforce net of memory use, not including allocator overhead, which is often significantly less than total memory used. This is surprising and difficult for users to tune for appropriately

Issue #1 has a workaround by setting an environment variable (AGGRESSIVE_DECOMMIT), but may have a performance impact. Further investigation ongoing.
Issue #2 has fixes in place in v3.3.5.
Issue #3 will likely be addressed by making the WiredTiger engine aware of memory allocation overhead, and tuning cache usage accordingly. (Need reference to WT ticket)

Regression tests for memory usage are being tracked here: SERVER-23333

Original Description
While loading data into mongo, each of the 3 primaries crashed with memory allocation issues. As data keeps loading, new primaries are elected. Eventually it looks like they come down as well. Some nodes have recovered and have come back up, but new ones keep coming down. Logs and diagnostic attached



 Comments   
Comment by Michael Cahill (Inactive) [ 28/Sep/16 ]

The primary fix for this issue was in WT-2665 and WT-2764, but there were subsequent tickets that dealt with some perf regressions.

The excessive memory fragmentation seen here was mainly caused by the difference between the way dirty and clean data is represented in WiredTiger. Dirty data involves smaller allocations (at the size of individual documents and index entries), and in the background that is rewritten into page images (typically 16-32KB). In 3.2.10 and above (and 3.3.11 and above), the WiredTiger storage engine only allows 20% of the cache to become dirty. Eviction works in the background to write dirty data and keep the cache from being filled with small allocations.

That changes in WT-2665 and WT-2764 limit the overhead from tcmalloc caching and fragmentation to 20% of the cache size (from fragmentation) plus 1GB of cached free memory with default settings.

Comment by Michael Cahill (Inactive) [ 27/Sep/16 ]

stephen.jannin@sgcib.com we have merged the relevant changes in the MongoDB 3.2 branch. They are currently being stabilized but we expect this issue to be fixed in the 3.2.10 release.

Comment by Stephen JANNIN [ 07/Sep/16 ]

It's a very good news that you managed to fix this issue.
Is it possible to have this fix in 3.2 ?

Comment by Daniel Pasette (Inactive) [ 18/Aug/16 ]

Thanks for breaking this down. The perf team will take this into account.

Comment by Alexander Gorrod [ 18/Aug/16 ]

pasette The most significant change we have made is to limit the amount of dirty content the cache can hold at any time. In MongoDB 3.2 WiredTiger allowed up to 100% of the cache to become full of dirty (aka modified) content, this led to checkpoints needing to do a lot more work, and could lead to more memory fragmentation in the allocator, since modified content generally consists of a lot more small memory allocations than clean content.

The workloads that could potentially suffer from the change are:

  • Workloads that configure a small cache size and have many threads modifying data.
  • Workloads where recently modified data is queried regularly. Since WiredTiger now more actively cleans dirty content from cache, sometimes that data will need to be read back in from disk to satisfy a query. We have attempted to mitigate this.
  • Update heavy workloads with a large volume of index data, since index updates tend to be random across the data they hold more dirty content in cache.

In our testing of these changes we see more performance improvements than degradations. The reasons the performance is better are:

  • Checkpoints finish faster and are therefore less disruptive. This is because checkpoints only need to write dirty content, and we have reduced the amount of dirty content.
  • The change to control dirty content in cache have led to a lot of improvements to the way our eviction server works. It is now more efficient and is more tuned towards MongoDB workloads than it used to be.
Comment by Daniel Pasette (Inactive) [ 17/Aug/16 ]

Alex, do you have an idea where the changes slated for 3.4 will most likely have a performance impact? I'd like to be able to add some performance regression tests that poke at these use cases.

Comment by Alexander Gorrod [ 17/Aug/16 ]

I've re-run the workload in this ticket comparing the MongoDB 3.2.9 release to MongoDB master updated to include the latest changes in WiredTiger develop branch.

A refresher on the workload:
It starts 10 threads, each upserting increasingly large records into the same table across a range of 100,000 records.

In both cases I started MongoDB with default cache size of 17GB on 3.2.9 and 14.5GB on master. Logging is disabled. I'm running the tests on a c3.4xlarge AWS machine, with SSD attached storage. I ran the test case for about 2 hours on each version, memory usage patterns reached a steady state after around 30 minutes. The following table summarizes relevant values for each configuration after an hour of runtime, all values in MB:

Version Current Allocated Page Heap Free Page Heap Free as % of cache size
3.2.9 14374 4059 23%
Master & Develop 7748 1484 10%

Note: The different configured cache sizes is not relevant to the final results. In the version on master the cache isn't approaching full utilization.

The results demonstrate that the recent changes in WiredTiger to limit allocator fragmentation are delivering a behavior improvement in the workload on this ticket. A more detailed analysis of the statistics over time lead me to have confidence that limiting the proportion of the cache that can be dirty (which is one of the primary differences between the two versions) is effectively limiting the amount of page heap fragmentation that can be triggered by MongoDB workloads. The limit is in-line with the proportion of the cache that can be dirty - the default limit on MongoDB master is 20% - which caps the amount of fragmentation that can be generated to 20% of the cache size. The limit on dirty content in 3.2.9 was effectively the cache size, limiting the fragmentation to 100% of the cache size.

I'm inclined to call this ticket done, resolved by the changes committed in WT-2665, which will be available in MongoDB 3.4.

We could further restrict the allocator overhead by enabling aggressive decommit. There are negative performance consequences to enabling aggressive decommit as discussed in SERVER-24303, making it unappealing to change the default behavior of MongoDB to enable the option by default.

Comment by Matthew Clark [ 05/May/16 ]

The printout was taken at the time I posted it - the instance is no longer any under load, the secondary is offline, but the primary is still consuming 15+GB of memory.

I will restart the instance, and run the same load profile, this time with

TCMALLOC_AGGRESSIVE_DECOMMIT=y

Will add more graphs in ~24h ( the lat time we did this, it consumed all available CPU cycles )

Thanks
-Matt

Comment by Martin Bligh [ 05/May/16 ]

mclark@demandware.com looks like it's nearly all in pageheap_free_bytes.
TCMALLOC_AGGRESSIVE_DECOMMIT=y should disable that entirely. (set as environment variable before starting mongod).

It may have a small performance impact in some microbenchmarks, but if it works well for you, I'll look at something slightly less aggressive.

Comment by Matthew Clark [ 05/May/16 ]

Attached is a tgz of my diag data, and here is the output of db.serverStatus().tcmalloc

rs03:PRIMARY> db.serverStatus().tcmalloc
{
	"generic" : {
		"current_allocated_bytes" : NumberLong("9149282008"),
		"heap_size" : NumberLong("19128152064")
	},
	"tcmalloc" : {
		"pageheap_free_bytes" : NumberLong("8442773504"),
		"pageheap_unmapped_bytes" : NumberLong(1244045312),
		"max_total_thread_cache_bytes" : NumberLong(1073741824),
		"current_total_thread_cache_bytes" : 3915440,
		"total_free_bytes" : 292051240,
		"central_cache_free_bytes" : 287376728,
		"transfer_cache_free_bytes" : 759072,
		"thread_cache_free_bytes" : 3915440,
		"aggressive_memory_decommit" : 0,
		"size_classes" : [
			{
				"bytes_per_object" : 0,
				"pages_per_span" : 0,
				"num_spans" : 0,
				"num_thread_objs" : 0,
				"num_central_objs" : 0,
				"num_transfer_objs" : 0,
				"free_bytes" : 0,
				"allocated_bytes" : 0
			},
			{
				"bytes_per_object" : 8,
				"pages_per_span" : 2,
				"num_spans" : 689,
				"num_thread_objs" : 1078,
				"num_central_objs" : 632191,
				"num_transfer_objs" : 0,
				"free_bytes" : 5066152,
				"allocated_bytes" : 5644288
			},
			{
				"bytes_per_object" : 16,
				"pages_per_span" : 2,
				"num_spans" : 631,
				"num_thread_objs" : 15463,
				"num_central_objs" : 223077,
				"num_transfer_objs" : 1536,
				"free_bytes" : 3841216,
				"allocated_bytes" : 5169152
			},
			{
				"bytes_per_object" : 32,
				"pages_per_span" : 2,
				"num_spans" : 1896,
				"num_thread_objs" : 1860,
				"num_central_objs" : 456377,
				"num_transfer_objs" : 0,
				"free_bytes" : 14663584,
				"allocated_bytes" : 15532032
			},
			{
				"bytes_per_object" : 48,
				"pages_per_span" : 2,
				"num_spans" : 4250,
				"num_thread_objs" : 1621,
				"num_central_objs" : 546600,
				"num_transfer_objs" : 0,
				"free_bytes" : 26314608,
				"allocated_bytes" : 34816000
			},
			{
				"bytes_per_object" : 64,
				"pages_per_span" : 2,
				"num_spans" : 5672,
				"num_thread_objs" : 1347,
				"num_central_objs" : 612673,
				"num_transfer_objs" : 128,
				"free_bytes" : 39305472,
				"allocated_bytes" : 46465024
			},
			{
				"bytes_per_object" : 80,
				"pages_per_span" : 2,
				"num_spans" : 723,
				"num_thread_objs" : 910,
				"num_central_objs" : 64271,
				"num_transfer_objs" : 102,
				"free_bytes" : 5222640,
				"allocated_bytes" : 5922816
			},
			{
				"bytes_per_object" : 96,
				"pages_per_span" : 2,
				"num_spans" : 1515,
				"num_thread_objs" : 762,
				"num_central_objs" : 116172,
				"num_transfer_objs" : 0,
				"free_bytes" : 11225664,
				"allocated_bytes" : 12410880
			},
			{
				"bytes_per_object" : 128,
				"pages_per_span" : 2,
				"num_spans" : 6299,
				"num_thread_objs" : 1540,
				"num_central_objs" : 382630,
				"num_transfer_objs" : 0,
				"free_bytes" : 49173760,
				"allocated_bytes" : 51601408
			},
			{
				"bytes_per_object" : 160,
				"pages_per_span" : 2,
				"num_spans" : 1501,
				"num_thread_objs" : 400,
				"num_central_objs" : 72270,
				"num_transfer_objs" : 0,
				"free_bytes" : 11627200,
				"allocated_bytes" : 12296192
			},
			{
				"bytes_per_object" : 192,
				"pages_per_span" : 2,
				"num_spans" : 4849,
				"num_thread_objs" : 697,
				"num_central_objs" : 130998,
				"num_transfer_objs" : 42,
				"free_bytes" : 25293504,
				"allocated_bytes" : 39723008
			},
			{
				"bytes_per_object" : 224,
				"pages_per_span" : 2,
				"num_spans" : 3046,
				"num_thread_objs" : 158,
				"num_central_objs" : 100515,
				"num_transfer_objs" : 36,
				"free_bytes" : 22558816,
				"allocated_bytes" : 24952832
			},
			{
				"bytes_per_object" : 256,
				"pages_per_span" : 2,
				"num_spans" : 1498,
				"num_thread_objs" : 130,
				"num_central_objs" : 38711,
				"num_transfer_objs" : 32,
				"free_bytes" : 9951488,
				"allocated_bytes" : 12271616
			},
			{
				"bytes_per_object" : 320,
				"pages_per_span" : 2,
				"num_spans" : 99,
				"num_thread_objs" : 105,
				"num_central_objs" : 2175,
				"num_transfer_objs" : 25,
				"free_bytes" : 737600,
				"allocated_bytes" : 811008
			},
			{
				"bytes_per_object" : 384,
				"pages_per_span" : 2,
				"num_spans" : 1369,
				"num_thread_objs" : 64,
				"num_central_objs" : 26568,
				"num_transfer_objs" : 0,
				"free_bytes" : 10226688,
				"allocated_bytes" : 11214848
			},
			{
				"bytes_per_object" : 448,
				"pages_per_span" : 2,
				"num_spans" : 31,
				"num_thread_objs" : 89,
				"num_central_objs" : 278,
				"num_transfer_objs" : 0,
				"free_bytes" : 164416,
				"allocated_bytes" : 253952
			},
			{
				"bytes_per_object" : 512,
				"pages_per_span" : 2,
				"num_spans" : 95,
				"num_thread_objs" : 123,
				"num_central_objs" : 1316,
				"num_transfer_objs" : 0,
				"free_bytes" : 736768,
				"allocated_bytes" : 778240
			},
			{
				"bytes_per_object" : 576,
				"pages_per_span" : 2,
				"num_spans" : 349,
				"num_thread_objs" : 187,
				"num_central_objs" : 4353,
				"num_transfer_objs" : 252,
				"free_bytes" : 2760192,
				"allocated_bytes" : 2859008
			},
			{
				"bytes_per_object" : 640,
				"pages_per_span" : 2,
				"num_spans" : 8,
				"num_thread_objs" : 27,
				"num_central_objs" : 17,
				"num_transfer_objs" : 0,
				"free_bytes" : 28160,
				"allocated_bytes" : 65536
			},
			{
				"bytes_per_object" : 704,
				"pages_per_span" : 2,
				"num_spans" : 7877,
				"num_thread_objs" : 91,
				"num_central_objs" : 56164,
				"num_transfer_objs" : 0,
				"free_bytes" : 39603520,
				"allocated_bytes" : 64528384
			},
			{
				"bytes_per_object" : 768,
				"pages_per_span" : 2,
				"num_spans" : 7,
				"num_thread_objs" : 29,
				"num_central_objs" : 25,
				"num_transfer_objs" : 0,
				"free_bytes" : 41472,
				"allocated_bytes" : 57344
			},
			{
				"bytes_per_object" : 896,
				"pages_per_span" : 2,
				"num_spans" : 44,
				"num_thread_objs" : 53,
				"num_central_objs" : 71,
				"num_transfer_objs" : 0,
				"free_bytes" : 111104,
				"allocated_bytes" : 360448
			},
			{
				"bytes_per_object" : 1024,
				"pages_per_span" : 2,
				"num_spans" : 394,
				"num_thread_objs" : 86,
				"num_central_objs" : 2612,
				"num_transfer_objs" : 0,
				"free_bytes" : 2762752,
				"allocated_bytes" : 3227648
			},
			{
				"bytes_per_object" : 1152,
				"pages_per_span" : 2,
				"num_spans" : 301,
				"num_thread_objs" : 35,
				"num_central_objs" : 4,
				"num_transfer_objs" : 0,
				"free_bytes" : 44928,
				"allocated_bytes" : 2465792
			},
			{
				"bytes_per_object" : 1280,
				"pages_per_span" : 2,
				"num_spans" : 12,
				"num_thread_objs" : 9,
				"num_central_objs" : 50,
				"num_transfer_objs" : 0,
				"free_bytes" : 75520,
				"allocated_bytes" : 98304
			},
			{
				"bytes_per_object" : 1408,
				"pages_per_span" : 3,
				"num_spans" : 8,
				"num_thread_objs" : 48,
				"num_central_objs" : 16,
				"num_transfer_objs" : 0,
				"free_bytes" : 90112,
				"allocated_bytes" : 98304
			},
			{
				"bytes_per_object" : 1664,
				"pages_per_span" : 2,
				"num_spans" : 74,
				"num_thread_objs" : 15,
				"num_central_objs" : 193,
				"num_transfer_objs" : 0,
				"free_bytes" : 346112,
				"allocated_bytes" : 606208
			},
			{
				"bytes_per_object" : 2048,
				"pages_per_span" : 2,
				"num_spans" : 91,
				"num_thread_objs" : 46,
				"num_central_objs" : 0,
				"num_transfer_objs" : 256,
				"free_bytes" : 618496,
				"allocated_bytes" : 745472
			},
			{
				"bytes_per_object" : 2304,
				"pages_per_span" : 3,
				"num_spans" : 20,
				"num_thread_objs" : 14,
				"num_central_objs" : 80,
				"num_transfer_objs" : 0,
				"free_bytes" : 216576,
				"allocated_bytes" : 245760
			},
			{
				"bytes_per_object" : 2560,
				"pages_per_span" : 2,
				"num_spans" : 11,
				"num_thread_objs" : 13,
				"num_central_objs" : 13,
				"num_transfer_objs" : 0,
				"free_bytes" : 66560,
				"allocated_bytes" : 90112
			},
			{
				"bytes_per_object" : 3072,
				"pages_per_span" : 3,
				"num_spans" : 22,
				"num_thread_objs" : 12,
				"num_central_objs" : 54,
				"num_transfer_objs" : 0,
				"free_bytes" : 202752,
				"allocated_bytes" : 270336
			},
			{
				"bytes_per_object" : 3328,
				"pages_per_span" : 5,
				"num_spans" : 6,
				"num_thread_objs" : 8,
				"num_central_objs" : 24,
				"num_transfer_objs" : 0,
				"free_bytes" : 106496,
				"allocated_bytes" : 122880
			},
			{
				"bytes_per_object" : 4096,
				"pages_per_span" : 2,
				"num_spans" : 424,
				"num_thread_objs" : 19,
				"num_central_objs" : 124,
				"num_transfer_objs" : 0,
				"free_bytes" : 585728,
				"allocated_bytes" : 3473408
			},
			{
				"bytes_per_object" : 4608,
				"pages_per_span" : 5,
				"num_spans" : 27,
				"num_thread_objs" : 19,
				"num_central_objs" : 77,
				"num_transfer_objs" : 0,
				"free_bytes" : 442368,
				"allocated_bytes" : 552960
			},
			{
				"bytes_per_object" : 5120,
				"pages_per_span" : 4,
				"num_spans" : 5,
				"num_thread_objs" : 7,
				"num_central_objs" : 4,
				"num_transfer_objs" : 0,
				"free_bytes" : 56320,
				"allocated_bytes" : 81920
			},
			{
				"bytes_per_object" : 6144,
				"pages_per_span" : 3,
				"num_spans" : 30,
				"num_thread_objs" : 27,
				"num_central_objs" : 22,
				"num_transfer_objs" : 0,
				"free_bytes" : 301056,
				"allocated_bytes" : 368640
			},
			{
				"bytes_per_object" : 6656,
				"pages_per_span" : 5,
				"num_spans" : 3,
				"num_thread_objs" : 6,
				"num_central_objs" : 2,
				"num_transfer_objs" : 0,
				"free_bytes" : 53248,
				"allocated_bytes" : 61440
			},
			{
				"bytes_per_object" : 8192,
				"pages_per_span" : 2,
				"num_spans" : 21976,
				"num_thread_objs" : 72,
				"num_central_objs" : 0,
				"num_transfer_objs" : 2,
				"free_bytes" : 606208,
				"allocated_bytes" : 180027392
			},
			{
				"bytes_per_object" : 10240,
				"pages_per_span" : 5,
				"num_spans" : 50,
				"num_thread_objs" : 15,
				"num_central_objs" : 46,
				"num_transfer_objs" : 0,
				"free_bytes" : 624640,
				"allocated_bytes" : 1024000
			},
			{
				"bytes_per_object" : 12288,
				"pages_per_span" : 3,
				"num_spans" : 58,
				"num_thread_objs" : 19,
				"num_central_objs" : 0,
				"num_transfer_objs" : 0,
				"free_bytes" : 233472,
				"allocated_bytes" : 712704
			},
			{
				"bytes_per_object" : 13312,
				"pages_per_span" : 7,
				"num_spans" : 9,
				"num_thread_objs" : 5,
				"num_central_objs" : 5,
				"num_transfer_objs" : 0,
				"free_bytes" : 133120,
				"allocated_bytes" : 258048
			},
			{
				"bytes_per_object" : 16384,
				"pages_per_span" : 4,
				"num_spans" : 306,
				"num_thread_objs" : 31,
				"num_central_objs" : 0,
				"num_transfer_objs" : 0,
				"free_bytes" : 507904,
				"allocated_bytes" : 5013504
			}
		]
	}
}

Cheers
-Matt

Comment by Bruce Lucas (Inactive) [ 05/May/16 ]

matthew.clark@despatch.net, would you also be able to archive the $dbpath/diagnostic.data directory and attach it to this ticket? That will contain a recent timeseries of metrics that will help us understan what you are encountering.

Comment by Martin Bligh [ 05/May/16 ]

mclark@demandware.com, could you grab db.serverStatus().tcmalloc ?
If pageheap_free_bytes is high, I'd recommend setting "TCMALLOC_AGGRESSIVE_DECOMMIT=y"
If central_cache_free_bytes is high, would love to see the full output.
If neither is high, it's likely to be a different problem.
Thanks!

Comment by Matthew Clark [ 05/May/16 ]

We tried running the latest version, 3.3.5, and still experience the same issue - my load test ran from 1730 -> 1+0640, at about 2125 the secondary member was killed off - I am still seeing the same issue with resident memory growing to substantial levels, and loosing all read tickets. The application side does not confirm this connection spike.

Comment by Bruce Lucas (Inactive) [ 03/May/16 ]

Comparison of runs on 3.2.6 vs 3.3.5 of the repro script above:

version 3.2.6

version 3.3.5

  • heap_size is essentially unchanged, so fragmentation seems to be about the same in 3.3.5
  • however resident set size is about 2 GB smaller in 3.3.5, accounted for by about 2 GB more of the page heap that has been unmapped. So the effect of fragmentation is mitigated in 3.3.5 by freeing the physical memory associated with some of the large fragmented regions.
Comment by Martin Bligh [ 29/Apr/16 ]

cbayer@demandware.com Yup, that's great. What you want is

db.serverStatus( { tcmalloc: 1} ).tcmalloc

You'll get some useful info from older releases too that might be interesting to see. (esp. central_cache_free_bytes)

Comment by Christian Bayer [ 29/Apr/16 ]

Thanks Martin, we can deploy a dev build on our test instance. I'll get back to you with the results. Should we grab the vanilla 3.3.5 from here https://www.mongodb.org/downloads#development?

Comment by Martin Bligh [ 29/Apr/16 ]

cbayer@demandware.com There's two places memory can get "stuck" - in the pageheap or the central freelist. Aggressive decommit will help with pageheap but not the central freelist.

If you can reproduce this outside of production, 3.3.5 has much more instrumentation in server status and I believe much improved tuning for non-pessimal workloads. Obviously it's a dev branch though, so not sure if that's possible for you to try.

Also might be helpful to run "perf top" or similar and see what it's actually doing with the CPU time and confirm it's actually tcmalloc. I use something like this:

perf record -q --call-graph fp -p `pidof mongod`

You'll have to decode it to text on your own system though, the output files seem to be very non-portable.

Comment by Christian Bayer [ 29/Apr/16 ]

I've been following this issue, we have a similar problem. Our workload is very fragmented, documents vary greatly in size (4KB-3MB). We tried 3.2.4 with TCMALLOC_AGGRESSIVE_DECOMMIT but it doesn't seem to help. One memory gets fragmented beyond a certain point, the server CPU goes up to 1000% and doesn't recover.

Comment by James Mangold [ 15/Apr/16 ]

We have not, but we will.

James Mangold – Solutions Architect - OMEGA | Interactive Data
Intercontinental Exchange | Interactive Data
100 Church Street, New York, NY 10007
Tel: +1-212-497-3269
james.mangold@interactivedata.com
interactivedata.com

Comment by Ramon Fernandez Marina [ 15/Apr/16 ]

james.mangold@interactivedata.com, have you had a chance to try Bruce's suggestion above?

We've made some changes in the development branch, but would be good to see the results of your experiments on 3.2 to see the impact of TCMALLOC_AGGRESSIVE_DECOMMIT with your workload.

Thanks,
Ramón.

Comment by Githook User [ 04/Apr/16 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-22906: Cut down number of tcmalloc size classes

Cut down number of size classes from 52 to 42.
The fewer sizes classes we have, the less memory we waste in overhead with unused allocations,
but the more we waste in roundoff error. We use 12.5% as a threshold here, extending the
conceptual limit that already existed in the code to how we merge size classes.
Branch: master
https://github.com/mongodb/mongo/commit/7be0eae950a33d49aaae967db250c265ccdfe463

Comment by Githook User [ 01/Apr/16 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-22906: Lower fragmentation in tcmalloc

Halve tcmalloc page size and the size of the smaller slabs - this (worst case)
should halve fragmentation for those sizes. Moving the page size down helps with
fitting for slab sizes like 20k.

No measurable perf impact on our microbenchmark suite, though it's definitely
going to hit the system allocator more.

Drop size classes for things above 16K. They're not of much use as they're all 1
object per slab (though they could be held in the thread cache).
Branch: master
https://github.com/mongodb/mongo/commit/fe6be11f1be4d82fa8bd93c50c71078938c70ed1

Comment by Martin Bligh [ 16/Mar/16 ]

Cut down tcmalloc to smaller page sizes/span in most size classes, which I'm convinced helps. The particular class below is stil unchanged atl 5 pages/span, but only 2 objects/span.

I can still generate 1G of fragmentation in some classes in the central freelist though.

MongoDB shell version: 3.3.0-73-g1d611a8
connecting to: test
{
        "current_allocated_bytes" : NumberLong("4672840736"),
        "heap_size" : NumberLong("6374375424")
}
{
        "pageheap_free_bytes" : 0,
        "pageheap_unmapped_bytes" : 541745152,
        "max_total_thread_cache_bytes" : NumberLong(1073741824),
        "current_total_thread_cache_bytes" : 233448504,
        "central_cache_free_bytes" : 912378784,
        "transfer_cache_free_bytes" : 14964416,
        "thread_cache_free_bytes" : 233477104,
        "aggressive_memory_decommit" : 1
}
------------------------------------------------
MALLOC:     4672840872 ( 4456.4 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +    912373880 (  870.1 MiB) Bytes in central cache freelist
MALLOC: +     14964416 (   14.3 MiB) Bytes in transfer cache freelist
MALLOC: +    233925664 (  223.1 MiB) Bytes in thread cache freelists
MALLOC: +     20230304 (   19.3 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =   5854335136 ( 5583.1 MiB) Actual memory used (physical + swap)
MALLOC: +    541319168 (  516.2 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =   6395654304 ( 6099.4 MiB) Virtual address space used
MALLOC:
MALLOC:         232534              Spans in use
MALLOC:             29              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
------------------------------------------------
Columns: Size Class, Bytes/object, pages per slab
Size of allocated + free: spans, MiB, utilization percent
Free object count: Total (central+transfer+threads), central, transfer
Free MiB (central+transfer+threads), Cumulative Free MiB
------------------------------------------------
  1 [       8]  1 p/s: Size: [     2 spans     0.0MiB  19.3% util] Free: [    1653 total,     1231 central,        0 transfer] Free MiB:   0.0 (  0.0 cum)
  2 [      16]  1 p/s: Size: [   399 spans     3.1MiB  99.6% util] Free: [     810 total,      470 central,        0 transfer] Free MiB:   0.0 (  0.0 cum)
  3 [      32]  1 p/s: Size: [   270 spans     2.1MiB  55.9% util] Free: [   30460 total,    29660 central,        0 transfer] Free MiB:   0.9 (  1.0 cum)
  4 [      48]  1 p/s: Size: [   126 spans     1.0MiB  40.7% util] Free: [   12762 total,    11458 central,        0 transfer] Free MiB:   0.6 (  1.5 cum)
  5 [      64]  1 p/s: Size: [    24 spans     0.2MiB  39.3% util] Free: [    1864 total,     1696 central,        0 transfer] Free MiB:   0.1 (  1.7 cum)
  6 [      80]  1 p/s: Size: [    26 spans     0.2MiB  13.6% util] Free: [    2301 total,     2236 central,        0 transfer] Free MiB:   0.2 (  1.8 cum)
  7 [      96]  1 p/s: Size: [    14 spans     0.1MiB  68.5% util] Free: [     376 total,      144 central,        0 transfer] Free MiB:   0.0 (  1.9 cum)
  8 [     112]  1 p/s: Size: [    50 spans     0.4MiB  41.0% util] Free: [    2156 total,     2072 central,        0 transfer] Free MiB:   0.2 (  2.1 cum)
  9 [     128]  1 p/s: Size: [    12 spans     0.1MiB  13.5% util] Free: [     664 total,      607 central,        0 transfer] Free MiB:   0.1 (  2.2 cum)
 10 [     144]  1 p/s: Size: [    14 spans     0.1MiB  22.4% util] Free: [     618 total,      575 central,        0 transfer] Free MiB:   0.1 (  2.3 cum)
 11 [     160]  1 p/s: Size: [   306 spans     2.4MiB  91.8% util] Free: [    1280 total,     1257 central,        0 transfer] Free MiB:   0.2 (  2.5 cum)
 12 [     176]  1 p/s: Size: [    25 spans     0.2MiB   6.5% util] Free: [    1088 total,     1035 central,        0 transfer] Free MiB:   0.2 (  2.6 cum)
 13 [     192]  1 p/s: Size: [    18 spans     0.1MiB  14.8% util] Free: [     654 total,      630 central,        0 transfer] Free MiB:   0.1 (  2.8 cum)
 14 [     208]  1 p/s: Size: [    15 spans     0.1MiB   1.7% util] Free: [     581 total,      562 central,        0 transfer] Free MiB:   0.1 (  2.9 cum)
 15 [     224]  1 p/s: Size: [    12 spans     0.1MiB   2.9% util] Free: [     426 total,      406 central,        0 transfer] Free MiB:   0.1 (  3.0 cum)
 16 [     240]  1 p/s: Size: [    27 spans     0.2MiB   1.7% util] Free: [     906 total,      875 central,        0 transfer] Free MiB:   0.2 (  3.2 cum)
 17 [     256]  1 p/s: Size: [    16 spans     0.1MiB   3.7% util] Free: [     493 total,      421 central,        0 transfer] Free MiB:   0.1 (  3.3 cum)
 18 [     288]  1 p/s: Size: [     5 spans     0.0MiB  14.9% util] Free: [     121 total,      101 central,        0 transfer] Free MiB:   0.0 (  3.3 cum)
 19 [     320]  1 p/s: Size: [    30 spans     0.2MiB  26.3% util] Free: [     566 total,      548 central,        0 transfer] Free MiB:   0.2 (  3.5 cum)
 20 [     352]  1 p/s: Size: [     5 spans     0.0MiB  20.9% util] Free: [      92 total,       19 central,        0 transfer] Free MiB:   0.0 (  3.5 cum)
 21 [     384]  1 p/s: Size: [     2 spans     0.0MiB  69.5% util] Free: [      13 total,        9 central,        0 transfer] Free MiB:   0.0 (  3.5 cum)
 22 [     416]  1 p/s: Size: [     5 spans     0.0MiB  56.3% util] Free: [      43 total,       24 central,        0 transfer] Free MiB:   0.0 (  3.5 cum)
 23 [     448]  1 p/s: Size: [     9 spans     0.1MiB  75.7% util] Free: [      40 total,       27 central,        0 transfer] Free MiB:   0.0 (  3.6 cum)
 24 [     480]  1 p/s: Size: [    26 spans     0.2MiB   6.2% util] Free: [     416 total,      415 central,        0 transfer] Free MiB:   0.2 (  3.8 cum)
 25 [     512]  1 p/s: Size: [    83 spans     0.6MiB   9.2% util] Free: [    1206 total,     1095 central,        0 transfer] Free MiB:   0.6 (  4.3 cum)
 26 [     576]  1 p/s: Size: [   766 spans     6.0MiB   2.9% util] Free: [   10581 total,     9490 central,     1017 transfer] Free MiB:   5.8 ( 10.2 cum)
 27 [     640]  1 p/s: Size: [   831 spans     6.5MiB   7.2% util] Free: [    9873 total,     8749 central,     1122 transfer] Free MiB:   6.0 ( 16.2 cum)
 28 [     704]  1 p/s: Size: [    67 spans     0.5MiB  24.5% util] Free: [     589 total,      368 central,        0 transfer] Free MiB:   0.4 ( 16.6 cum)
 29 [     768]  1 p/s: Size: [   465 spans     3.6MiB   7.0% util] Free: [    4611 total,     4097 central,      510 transfer] Free MiB:   3.4 ( 20.0 cum)
 30 [     896]  1 p/s: Size: [   750 spans     5.9MiB   4.4% util] Free: [    6555 total,     5811 central,      730 transfer] Free MiB:   5.6 ( 25.6 cum)
 31 [    1024]  1 p/s: Size: [   115 spans     0.9MiB  31.5% util] Free: [     630 total,      578 central,        0 transfer] Free MiB:   0.6 ( 26.2 cum)
 32 [    1152]  1 p/s: Size: [   657 spans     5.1MiB  11.3% util] Free: [    4144 total,     3522 central,      616 transfer] Free MiB:   4.6 ( 30.7 cum)
 33 [    1280]  1 p/s: Size: [   470 spans     3.7MiB  18.0% util] Free: [    2466 total,     2026 central,      408 transfer] Free MiB:   3.0 ( 33.7 cum)
 34 [    1408]  2 p/s: Size: [   387 spans     6.0MiB   8.1% util] Free: [    4139 total,     3685 central,      414 transfer] Free MiB:   5.6 ( 39.3 cum)
 35 [    1536]  1 p/s: Size: [   286 spans     2.2MiB   6.8% util] Free: [    1422 total,     1128 central,      294 transfer] Free MiB:   2.1 ( 41.4 cum)
 36 [    1664]  2 p/s: Size: [   491 spans     7.7MiB   8.8% util] Free: [    4409 total,     3883 central,      507 transfer] Free MiB:   7.0 ( 48.4 cum)
 37 [    2048]  1 p/s: Size: [    57 spans     0.4MiB   5.3% util] Free: [     216 total,      103 central,       64 transfer] Free MiB:   0.4 ( 48.8 cum)
 38 [    2304]  2 p/s: Size: [   285 spans     4.5MiB   1.7% util] Free: [    1993 total,     1680 central,      308 transfer] Free MiB:   4.4 ( 53.2 cum)
 39 [    2560]  1 p/s: Size: [   214 spans     1.7MiB   6.2% util] Free: [     642 total,      413 central,      225 transfer] Free MiB:   1.6 ( 54.7 cum)
 40 [    2816]  3 p/s: Size: [   275 spans     6.4MiB   9.2% util] Free: [    2179 total,     1836 central,      322 transfer] Free MiB:   5.9 ( 60.6 cum)
 41 [    3072]  2 p/s: Size: [   110 spans     1.7MiB   6.6% util] Free: [     548 total,      419 central,      126 transfer] Free MiB:   1.6 ( 62.2 cum)
 42 [    3328]  3 p/s: Size: [   188 spans     4.4MiB   7.8% util] Free: [    1280 total,     1066 central,      190 transfer] Free MiB:   4.1 ( 66.3 cum)
 43 [    4096]  1 p/s: Size: [   104 spans     0.8MiB  48.1% util] Free: [     108 total,       56 central,        0 transfer] Free MiB:   0.4 ( 66.7 cum)
 44 [    4608]  3 p/s: Size: [   193 spans     4.5MiB  10.0% util] Free: [     926 total,      702 central,      182 transfer] Free MiB:   4.1 ( 70.8 cum)
 45 [    5120]  2 p/s: Size: [   121 spans     1.9MiB  11.2% util] Free: [     344 total,      199 central,      132 transfer] Free MiB:   1.7 ( 72.4 cum)
 46 [    6144]  3 p/s: Size: [   110 spans     2.6MiB  23.2% util] Free: [     338 total,      142 central,      120 transfer] Free MiB:   2.0 ( 74.4 cum)
 47 [    6656]  5 p/s: Size: [    51 spans     2.0MiB  22.9% util] Free: [     242 total,      209 central,        9 transfer] Free MiB:   1.5 ( 75.9 cum)
 48 [    8192]  1 p/s: Size: [   477 spans     3.7MiB  92.0% util] Free: [      38 total,        0 central,        0 transfer] Free MiB:   0.3 ( 76.2 cum)
 49 [    9216]  5 p/s: Size: [    65 spans     2.5MiB  94.5% util] Free: [      16 total,       12 central,        0 transfer] Free MiB:   0.1 ( 76.4 cum)
 50 [   10240]  4 p/s: Size: [    52 spans     1.6MiB  39.9% util] Free: [     100 total,       78 central,       18 transfer] Free MiB:   1.0 ( 77.4 cum)
 51 [   12288]  3 p/s: Size: [    89 spans     2.1MiB  41.6% util] Free: [     104 total,       54 central,       50 transfer] Free MiB:   1.2 ( 78.6 cum)
 52 [   13312]  5 p/s: Size: [    73 spans     2.9MiB   3.4% util] Free: [     217 total,      145 central,       72 transfer] Free MiB:   2.8 ( 81.3 cum)
 53 [   16384]  2 p/s: Size: [  1957 spans    30.6MiB  63.2% util] Free: [     720 total,        0 central,       64 transfer] Free MiB:  11.2 ( 92.6 cum)
 54 [   20480]  5 p/s: Size: [ 73151 spans  2857.5MiB  65.0% util] Free: [   51257 total,    40647 central,       51 transfer] Free MiB: 1001.1 (1093.7 cum)
 55 [   24576]  3 p/s: Size: [102516 spans  2402.7MiB 100.0% util] Free: [      39 total,        0 central,        0 transfer] Free MiB:   0.9 (1094.6 cum)
 56 [   26624]  7 p/s: Size: [  1352 spans    73.9MiB  98.7% util] Free: [      39 total,        0 central,        0 transfer] Free MiB:   1.0 (1095.6 cum)
 57 [   32768]  4 p/s: Size: [  1096 spans    34.2MiB  96.3% util] Free: [      41 total,        0 central,        0 transfer] Free MiB:   1.3 (1096.9 cum)
 58 [   40960]  5 p/s: Size: [    37 spans     1.4MiB  81.1% util] Free: [       7 total,        0 central,        0 transfer] Free MiB:   0.3 (1097.2 cum)
 60 [   57344]  7 p/s: Size: [     4 spans     0.2MiB  75.0% util] Free: [       1 total,        0 central,        0 transfer] Free MiB:   0.1 (1097.2 cum)
 61 [   65536]  8 p/s: Size: [     6 spans     0.4MiB  33.3% util] Free: [       4 total,        0 central,        0 transfer] Free MiB:   0.2 (1097.5 cum)
 63 [   81920] 10 p/s: Size: [     1 spans     0.1MiB   0.0% util] Free: [       1 total,        0 central,        0 transfer] Free MiB:   0.1 (1097.5 cum)
 73 [  163840] 20 p/s: Size: [    32 spans     5.0MiB  96.9% util] Free: [       1 total,        0 central,        0 transfer] Free MiB:   0.2 (1097.7 cum)
 84 [  253952] 31 p/s: Size: [     1 spans     0.2MiB   0.0% util] Free: [       1 total,        0 central,        0 transfer] Free MiB:   0.2 (1097.9 cum)

Comment by James Mangold [ 11/Mar/16 ]

Hi Bruce, Yes we can try that. I am going to involve my systems team, as well as another developer. Thank you for your help on this.

Comment by Bruce Lucas (Inactive) [ 11/Mar/16 ]

james.mangold@interactivedata.com, based on the data that you uploaded, it looks like a substantial portion of the memory bloat you are seeing is due to free memory in the tcmalloc page heap, which rises to about about 67 GB. Based on our experiments we believe that setting the "TCMALLOC_AGGRESSIVE_DECOMMIT" environment variable to "y" in the environment where the mongod process is started will cause this memory to be returned to the o/s so that it doesn't occupy actual physical memory, and that should substantially alleviate the issue you are seeing. (Another 60 GB of free memory in your case is retained in the tcmalloc central free list; this problem will remain.) Would you be in a position to try this experiment, and then upload the $dbpath/diagnostic.data directory again once you have done so? As Martin says we don't expect this will have a substantial performance impact, but you should monitor to make sure that is the case.

Comment by Martin Bligh [ 11/Mar/16 ]

We will do further testing on the perf impact, but based on upstream comments in the 2.4-rc commits for gperftools, believe it to be minimal.
This option is on by default in 2.4 and later - we plan to pick up a new version soon, but are on 2.2 for now.

Comment by Bruce Lucas (Inactive) [ 11/Mar/16 ]

Here's a run of the same repro as posted above with the same parameters (cache size etc.) but with the environment variable TCMALLOC_AGGRESSIVE_DECOMMIT=y set.

  • virtual memory grows as before
  • but crucially resident memory, i.e. actual physical memory used, does not grow excessively
  • the reason is seen in "pageheap_unmapped_bytes": the aggressive decommit setting has caused tcmalloc to unmap large regions of memory returned to its page heap, so that they don't occupy physical memory
  • the maximum effect of fragmentation is only about 1 GB, as represented by the "central_cache_free_bytes" metric - this is small regions of memory in tcmalloc's central free list that cannot yet be returned to the page heap and so can't be unmapped

So for this particular (artificial) repro workload aggressive decommit helps quite a lot. Performance impact has not be evaluated.

Comment by Martin Bligh [ 11/Mar/16 ]

Left it running overnight w/ 16GB WT cache. Peaked at 17841 GB allocated, which is pretty reasonable given other overhead outside of WT cache. Final disposition of spreads was clustered heavily around 48K, presumably from WT reconcilliation.

connecting to: test
{
        "current_allocated_bytes" : NumberLong("17715703400"),
        "heap_size" : NumberLong("22023323648")
}
{
        "pageheap_free_bytes" : 0,
        "pageheap_unmapped_bytes" : NumberLong("3939401728"),
        "max_total_thread_cache_bytes" : NumberLong(1073741824),
        "current_total_thread_cache_bytes" : 273588672,
        "central_cache_free_bytes" : 59550104,
        "transfer_cache_free_bytes" : 26192336,
        "thread_cache_free_bytes" : 273580488,
        "aggressive_memory_decommit" : 1
}
------------------------------------------------
MALLOC:    17737074016 (16915.4 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +     59551104 (   56.8 MiB) Bytes in central cache freelist
MALLOC: +     25999840 (   24.8 MiB) Bytes in transfer cache freelist
MALLOC: +    272831296 (  260.2 MiB) Bytes in thread cache freelists
MALLOC: +     50622624 (   48.3 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =  18146078880 (17305.4 MiB) Actual memory used (physical + swap)
MALLOC: +   3927867392 ( 3745.9 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  22073946272 (21051.4 MiB) Virtual address space used
MALLOC:
MALLOC:         112687              Spans in use
MALLOC:             33              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
------------------------------------------------
Total size of freelists for per-thread caches,
transfer cache, and central cache, by size class
------------------------------------------------
class   1 [        8 bytes ] :   113584 objs;   0.9 MiB;   0.9 cum MiB
class   2 [       16 bytes ] :    86709 objs;   1.3 MiB;   2.2 cum MiB
class   3 [       32 bytes ] :    22192 objs;   0.7 MiB;   2.9 cum MiB
class   4 [       48 bytes ] :    60306 objs;   2.8 MiB;   5.6 cum MiB
class   5 [       64 bytes ] :    20752 objs;   1.3 MiB;   6.9 cum MiB
class   6 [       80 bytes ] :    39500 objs;   3.0 MiB;   9.9 cum MiB
class   7 [       96 bytes ] :    18155 objs;   1.7 MiB;  11.6 cum MiB
class   8 [      112 bytes ] :    29269 objs;   3.1 MiB;  14.7 cum MiB
class   9 [      128 bytes ] :     3833 objs;   0.5 MiB;  15.2 cum MiB
class  10 [      144 bytes ] :     1426 objs;   0.2 MiB;  15.4 cum MiB
class  11 [      160 bytes ] :     1419 objs;   0.2 MiB;  15.6 cum MiB
class  12 [      176 bytes ] :      971 objs;   0.2 MiB;  15.7 cum MiB
class  13 [      192 bytes ] :     1078 objs;   0.2 MiB;  15.9 cum MiB
class  14 [      208 bytes ] :     1166 objs;   0.2 MiB;  16.2 cum MiB
class  15 [      224 bytes ] :     1085 objs;   0.2 MiB;  16.4 cum MiB
class  16 [      240 bytes ] :     1673 objs;   0.4 MiB;  16.8 cum MiB
class  17 [      256 bytes ] :      598 objs;   0.1 MiB;  16.9 cum MiB
class  18 [      288 bytes ] :     9885 objs;   2.7 MiB;  19.6 cum MiB
class  19 [      320 bytes ] :    12254 objs;   3.7 MiB;  23.4 cum MiB
class  20 [      352 bytes ] :     8793 objs;   3.0 MiB;  26.3 cum MiB
class  21 [      384 bytes ] :    15048 objs;   5.5 MiB;  31.8 cum MiB
class  22 [      416 bytes ] :    15283 objs;   6.1 MiB;  37.9 cum MiB
class  23 [      448 bytes ] :    11517 objs;   4.9 MiB;  42.8 cum MiB
class  24 [      480 bytes ] :    13359 objs;   6.1 MiB;  48.9 cum MiB
class  25 [      512 bytes ] :      331 objs;   0.2 MiB;  49.1 cum MiB
class  26 [      576 bytes ] :      477 objs;   0.3 MiB;  49.4 cum MiB
class  27 [      640 bytes ] :     1190 objs;   0.7 MiB;  50.1 cum MiB
class  28 [      704 bytes ] :     1689 objs;   1.1 MiB;  51.2 cum MiB
class  29 [      768 bytes ] :       17 objs;   0.0 MiB;  51.2 cum MiB
class  30 [      832 bytes ] :     3883 objs;   3.1 MiB;  54.3 cum MiB
class  31 [      896 bytes ] :     2448 objs;   2.1 MiB;  56.4 cum MiB
class  32 [      960 bytes ] :       71 objs;   0.1 MiB;  56.5 cum MiB
class  33 [     1024 bytes ] :      168 objs;   0.2 MiB;  56.6 cum MiB
class  34 [     1152 bytes ] :     3085 objs;   3.4 MiB;  60.0 cum MiB
class  35 [     1280 bytes ] :      205 objs;   0.3 MiB;  60.3 cum MiB
class  36 [     1408 bytes ] :     3837 objs;   5.2 MiB;  65.4 cum MiB
class  37 [     1536 bytes ] :      101 objs;   0.1 MiB;  65.6 cum MiB
class  38 [     1792 bytes ] :       61 objs;   0.1 MiB;  65.7 cum MiB
class  39 [     2048 bytes ] :       54 objs;   0.1 MiB;  65.8 cum MiB
class  40 [     2304 bytes ] :       22 objs;   0.0 MiB;  65.8 cum MiB
class  41 [     2560 bytes ] :       33 objs;   0.1 MiB;  65.9 cum MiB
class  42 [     2816 bytes ] :     1104 objs;   3.0 MiB;  68.9 cum MiB
class  43 [     3072 bytes ] :       20 objs;   0.1 MiB;  68.9 cum MiB
class  44 [     3328 bytes ] :     1060 objs;   3.4 MiB;  72.3 cum MiB
class  45 [     4096 bytes ] :      216 objs;   0.8 MiB;  73.2 cum MiB
class  46 [     4608 bytes ] :      127 objs;   0.6 MiB;  73.7 cum MiB
class  47 [     5120 bytes ] :       14 objs;   0.1 MiB;  73.8 cum MiB
class  48 [     6144 bytes ] :       69 objs;   0.4 MiB;  74.2 cum MiB
class  49 [     6656 bytes ] :        4 objs;   0.0 MiB;  74.2 cum MiB
class  50 [     8192 bytes ] :       44 objs;   0.3 MiB;  74.6 cum MiB
class  51 [     9216 bytes ] :       86 objs;   0.8 MiB;  75.3 cum MiB
class  52 [    10240 bytes ] :       11 objs;   0.1 MiB;  75.4 cum MiB
class  53 [    12288 bytes ] :      106 objs;   1.2 MiB;  76.7 cum MiB
class  54 [    13312 bytes ] :       53 objs;   0.7 MiB;  77.3 cum MiB
class  55 [    16384 bytes ] :       97 objs;   1.5 MiB;  78.8 cum MiB
class  56 [    20480 bytes ] :        9 objs;   0.2 MiB;  79.0 cum MiB
class  57 [    24576 bytes ] :       42 objs;   1.0 MiB;  80.0 cum MiB
class  58 [    26624 bytes ] :       77 objs;   2.0 MiB;  82.0 cum MiB
class  59 [    32768 bytes ] :       70 objs;   2.2 MiB;  84.1 cum MiB
class  60 [    40960 bytes ] :       54 objs;   2.1 MiB;  86.3 cum MiB
class  61 [    49152 bytes ] :     2954 objs; 138.5 MiB; 224.7 cum MiB
class  62 [    57344 bytes ] :       81 objs;   4.4 MiB; 229.2 cum MiB
class  63 [    65536 bytes ] :       80 objs;   5.0 MiB; 234.2 cum MiB
class  64 [    73728 bytes ] :       40 objs;   2.8 MiB; 237.0 cum MiB
class  65 [    81920 bytes ] :       40 objs;   3.1 MiB; 240.1 cum MiB
class  66 [    90112 bytes ] :       33 objs;   2.8 MiB; 242.9 cum MiB
class  67 [    98304 bytes ] :       39 objs;   3.7 MiB; 246.6 cum MiB
class  68 [   106496 bytes ] :       35 objs;   3.6 MiB; 250.1 cum MiB
class  69 [   114688 bytes ] :       34 objs;   3.7 MiB; 253.9 cum MiB
class  70 [   122880 bytes ] :       43 objs;   5.0 MiB; 258.9 cum MiB
class  71 [   131072 bytes ] :       70 objs;   8.8 MiB; 267.6 cum MiB
class  72 [   139264 bytes ] :       32 objs;   4.2 MiB; 271.9 cum MiB
class  73 [   147456 bytes ] :       28 objs;   3.9 MiB; 275.8 cum MiB
class  74 [   155648 bytes ] :       34 objs;   5.0 MiB; 280.9 cum MiB
class  75 [   163840 bytes ] :       26 objs;   4.1 MiB; 284.9 cum MiB
class  76 [   172032 bytes ] :       26 objs;   4.3 MiB; 289.2 cum MiB
class  77 [   180224 bytes ] :       29 objs;   5.0 MiB; 294.2 cum MiB
class  78 [   188416 bytes ] :       31 objs;   5.6 MiB; 299.8 cum MiB
class  79 [   196608 bytes ] :       30 objs;   5.6 MiB; 305.4 cum MiB
class  80 [   204800 bytes ] :       40 objs;   7.8 MiB; 313.2 cum MiB
class  81 [   212992 bytes ] :       23 objs;   4.7 MiB; 317.9 cum MiB
class  82 [   221184 bytes ] :       19 objs;   4.0 MiB; 321.9 cum MiB
class  83 [   229376 bytes ] :       20 objs;   4.4 MiB; 326.3 cum MiB
class  84 [   237568 bytes ] :       27 objs;   6.1 MiB; 332.4 cum MiB
class  85 [   245760 bytes ] :        1 objs;   0.2 MiB; 332.6 cum MiB
class  86 [   253952 bytes ] :        2 objs;   0.5 MiB; 333.1 cum MiB
class  87 [   262144 bytes ] :       27 objs;   6.8 MiB; 339.8 cum MiB
------------------------------------------------
PageHeap: 84 sizes;    0.0 MiB free; 3745.9 MiB unmapped
------------------------------------------------
     1 pages *    959 spans ~    7.5 MiB;    7.5 MiB cum; unmapped:    7.5 MiB;    7.5 MiB cum
     2 pages *    937 spans ~   14.6 MiB;   22.1 MiB cum; unmapped:   14.6 MiB;   22.1 MiB cum
     3 pages *    401 spans ~    9.4 MiB;   31.5 MiB cum; unmapped:    9.4 MiB;   31.5 MiB cum
     4 pages *    765 spans ~   23.9 MiB;   55.4 MiB cum; unmapped:   23.9 MiB;   55.4 MiB cum
     5 pages *    376 spans ~   14.7 MiB;   70.1 MiB cum; unmapped:   14.7 MiB;   70.1 MiB cum
     6 pages *    709 spans ~   33.2 MiB;  103.4 MiB cum; unmapped:   33.2 MiB;  103.4 MiB cum
     7 pages *    362 spans ~   19.8 MiB;  123.2 MiB cum; unmapped:   19.8 MiB;  123.2 MiB cum
     8 pages *    546 spans ~   34.1 MiB;  157.3 MiB cum; unmapped:   34.1 MiB;  157.3 MiB cum
     9 pages *    390 spans ~   27.4 MiB;  184.7 MiB cum; unmapped:   27.4 MiB;  184.7 MiB cum
    10 pages *    491 spans ~   38.4 MiB;  223.1 MiB cum; unmapped:   38.4 MiB;  223.1 MiB cum
    11 pages *    385 spans ~   33.1 MiB;  256.1 MiB cum; unmapped:   33.1 MiB;  256.1 MiB cum
    12 pages *    470 spans ~   44.1 MiB;  300.2 MiB cum; unmapped:   44.1 MiB;  300.2 MiB cum
    13 pages *    394 spans ~   40.0 MiB;  340.2 MiB cum; unmapped:   40.0 MiB;  340.2 MiB cum
    14 pages *    478 spans ~   52.3 MiB;  392.5 MiB cum; unmapped:   52.3 MiB;  392.5 MiB cum
    15 pages *    432 spans ~   50.6 MiB;  443.1 MiB cum; unmapped:   50.6 MiB;  443.1 MiB cum
    16 pages *    494 spans ~   61.8 MiB;  504.9 MiB cum; unmapped:   61.8 MiB;  504.9 MiB cum
    17 pages *    404 spans ~   53.7 MiB;  558.5 MiB cum; unmapped:   53.7 MiB;  558.5 MiB cum
    18 pages *    478 spans ~   67.2 MiB;  625.8 MiB cum; unmapped:   67.2 MiB;  625.8 MiB cum
    19 pages *    460 spans ~   68.3 MiB;  694.0 MiB cum; unmapped:   68.3 MiB;  694.0 MiB cum
    20 pages *    467 spans ~   73.0 MiB;  767.0 MiB cum; unmapped:   73.0 MiB;  767.0 MiB cum
    21 pages *    442 spans ~   72.5 MiB;  839.5 MiB cum; unmapped:   72.5 MiB;  839.5 MiB cum
    22 pages *    565 spans ~   97.1 MiB;  936.6 MiB cum; unmapped:   97.1 MiB;  936.6 MiB cum
    23 pages *    477 spans ~   85.7 MiB; 1022.3 MiB cum; unmapped:   85.7 MiB; 1022.3 MiB cum
    24 pages *    567 spans ~  106.3 MiB; 1128.7 MiB cum; unmapped:  106.3 MiB; 1128.7 MiB cum
    25 pages *    341 spans ~   66.6 MiB; 1195.3 MiB cum; unmapped:   66.6 MiB; 1195.3 MiB cum
    26 pages *   

Collection stats:

{
        "ns" : "test.c",
        "count" : 100000,
        "size" : 21982527081,
        "avgObjSize" : 219825,
        "storageSize" : 6492299264,
        "capped" : false,
        "wiredTiger" : {
                "metadata" : {
                        "formatVersion" : 1
                },
                "creationString" : "allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=0,extractor=,format=btree,huffman_key=,huffman_value=,immutable=0,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=),lsm=(auto_throttle=,bloom=,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=0,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_max=15,merge_min=0),memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u",
                "type" : "file",
                "uri" : "statistics:table:collection-2--2120731092415723395",
                "LSM" : {
                        "bloom filters in the LSM tree" : 0,
                        "bloom filter false positives" : 0,
                        "bloom filter hits" : 0,
                        "bloom filter misses" : 0,
                        "bloom filter pages evicted from cache" : 0,
                        "bloom filter pages read into cache" : 0,
                        "total size of bloom filters" : 0,
                        "sleep for LSM checkpoint throttle" : 0,
                        "chunks in the LSM tree" : 0,
                        "highest merge generation in the LSM tree" : 0,
                        "queries that could have benefited from a Bloom filter that did not exist" : 0,
                        "sleep for LSM merge throttle" : 0
                },
                "block-manager" : {
                        "file allocation unit size" : 4096,
                        "blocks allocated" : 233536602,
                        "checkpoint size" : 3284668416,
                        "allocations requiring file extension" : 582830,
                        "blocks freed" : 233432229,
                        "file magic number" : 120897,
                        "file major version number" : 1,
                        "minor version number" : 0,
                        "file bytes available for reuse" : 325238784,
                        "file size in bytes" : 6492299264
                },
                "btree" : {
                        "btree checkpoint generation" : 1221,
                        "column-store variable-size deleted values" : 0,
                        "column-store fixed-size leaf pages" : 0,
                        "column-store internal pages" : 0,
                        "column-store variable-size RLE encoded values" : 0,
                        "column-store variable-size leaf pages" : 0,
                        "pages rewritten by compaction" : 0,
                        "number of key/value pairs" : 0,
                        "fixed-record size" : 0,
                        "maximum tree depth" : 4,
                        "maximum internal page key size" : 368,
                        "maximum internal page size" : 4096,
                        "maximum leaf page key size" : 2867,
                        "maximum leaf page size" : 32768,
                        "maximum leaf page value size" : 67108864,
                        "overflow pages" : 0,
                        "row-store internal pages" : 0,
                        "row-store leaf pages" : 0
                },
                "cache" : {
                        "bytes read into cache" : NumberLong("27329731661631"),
                        "bytes written from cache" : NumberLong("31814672198029"),
                        "checkpoint blocked page eviction" : 28116309,
                        "unmodified pages evicted" : 9,
                        "page split during eviction deepened the tree" : 1,
                        "modified pages evicted" : 182190535,
                        "data source pages selected for eviction unable to be evicted" : 345199,
                        "hazard pointer blocked page eviction" : 4660,
                        "internal pages evicted" : 20,
                        "internal pages split during eviction" : 5,
                        "leaf pages split during eviction" : 3023,
                        "in-memory page splits" : 4,
                        "in-memory page passed criteria to be split" : 8,
                        "overflow values cached in memory" : 0,
                        "pages read into cache" : 182223921,
                        "pages read into cache requiring lookaside entries" : 0,
                        "overflow pages read into cache" : 0,
                        "pages written from cache" : 233534160,
                        "page written requiring lookaside records" : 0,
                        "pages written requiring in-memory restoration" : 0
                },
                "compression" : {
                        "raw compression call failed, no additional data available" : 0,
                        "raw compression call failed, additional data available" : 0,
                        "raw compression call succeeded" : 0,
                        "compressed pages read" : 182223901,
                        "compressed pages written" : 232686288,
                        "page written failed to compress" : 0,
                        "page written was too small to compress" : 847870
                },
                "cursor" : {
                        "create calls" : 29088,
                        "insert calls" : 480270514,
                        "bulk-loaded cursor-insert calls" : 0,
                        "cursor-insert key and value bytes inserted" : NumberLong("52603113528988"),
                        "next calls" : 0,
                        "prev calls" : 1,
                        "remove calls" : 0,
                        "cursor-remove key bytes removed" : 0,
                        "reset calls" : 1442451450,
                        "restarted searches" : 6858,
                        "search calls" : 960341029,
                        "search near calls" : 0,
                        "truncate calls" : 0,
                        "update calls" : 0,
                        "cursor-update value bytes updated" : 0
                },
                "reconciliation" : {
                        "dictionary matches" : 0,
                        "internal page multi-block writes" : 63961,
                        "leaf page multi-block writes" : 612686,
                        "maximum blocks required for a page" : 26,
                        "internal-page overflow keys" : 0,
                        "leaf-page overflow keys" : 0,
                        "overflow values written" : 0,
                        "pages deleted" : 0,
                        "fast-path pages deleted" : 0,
                        "page checksum matches" : 623597,
                        "page reconciliation calls" : 229848890,
                        "page reconciliation calls for eviction" : 126427015,
                        "leaf page key bytes discarded using prefix compression" : 0,
                        "internal page key bytes discarded using suffix compression" : 4076846
                },
                "session" : {
                        "object compaction" : 0,
                        "open cursor count" : 43
                },
                "transaction" : {
                        "update conflicts" : 48510
                }
        },
        "nindexes" : 1,
        "totalIndexSize" : 1204224,
        "indexSizes" : {
                "_id_" : 1204224
        },
        "ok" : 1
}

Comment by Martin Bligh [ 10/Mar/16 ]

15 mins into a new run (with agggressive decommit enabled)
Virt ~11GB, Phys ~8GB. pageheap_free = 0, pageheap_unmapped ~3GB

------------------------------------------------
MALLOC:    10230238560 ( 9756.3 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +     41057264 (   39.2 MiB) Bytes in central cache freelist
MALLOC: +     14739040 (   14.1 MiB) Bytes in transfer cache freelist
MALLOC: +    220803152 (  210.6 MiB) Bytes in thread cache freelists
MALLOC: +     30830752 (   29.4 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =  10537668768 (10049.5 MiB) Actual memory used (physical + swap)
MALLOC: +   1740611584 ( 1660.0 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  12278280352 (11709.5 MiB) Virtual address space used
MALLOC:
MALLOC:         183331              Spans in use
MALLOC:             33              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
------------------------------------------------
Total size of freelists for per-thread caches,
transfer cache, and central cache, by size class
------------------------------------------------
class   1 [        8 bytes ] :    11242 objs;   0.1 MiB;   0.1 cum MiB
class   2 [       16 bytes ] :    75181 objs;   1.1 MiB;   1.2 cum MiB
class   3 [       32 bytes ] :    18150 objs;   0.6 MiB;   1.8 cum MiB
class   4 [       48 bytes ] :     3693 objs;   0.2 MiB;   2.0 cum MiB
class   5 [       64 bytes ] :     1538 objs;   0.1 MiB;   2.0 cum MiB
class   6 [       80 bytes ] :    17711 objs;   1.4 MiB;   3.4 cum MiB
class   7 [       96 bytes ] :      821 objs;   0.1 MiB;   3.5 cum MiB
class   8 [      112 bytes ] :     6479 objs;   0.7 MiB;   4.2 cum MiB
class   9 [      128 bytes ] :      679 objs;   0.1 MiB;   4.3 cum MiB
class  10 [      144 bytes ] :      537 objs;   0.1 MiB;   4.3 cum MiB
class  11 [      160 bytes ] :     1825 objs;   0.3 MiB;   4.6 cum MiB
class  12 [      176 bytes ] :      380 objs;   0.1 MiB;   4.7 cum MiB
class  13 [      192 bytes ] :      409 objs;   0.1 MiB;   4.7 cum MiB
class  14 [      208 bytes ] :      464 objs;   0.1 MiB;   4.8 cum MiB
class  15 [      224 bytes ] :      424 objs;   0.1 MiB;   4.9 cum MiB
class  16 [      240 bytes ] :      662 objs;   0.2 MiB;   5.1 cum MiB
class  17 [      256 bytes ] :      604 objs;   0.1 MiB;   5.2 cum MiB
class  18 [      288 bytes ] :      760 objs;   0.2 MiB;   5.4 cum MiB
class  19 [      320 bytes ] :     1082 objs;   0.3 MiB;   5.8 cum MiB
class  20 [      352 bytes ] :     1212 objs;   0.4 MiB;   6.2 cum MiB
class  21 [      384 bytes ] :     1696 objs;   0.6 MiB;   6.8 cum MiB
class  22 [      416 bytes ] :     1661 objs;   0.7 MiB;   7.4 cum MiB
class  23 [      448 bytes ] :     2111 objs;   0.9 MiB;   8.4 cum MiB
class  24 [      480 bytes ] :     1877 objs;   0.9 MiB;   9.2 cum MiB
class  25 [      512 bytes ] :     2501 objs;   1.2 MiB;  10.4 cum MiB
class  26 [      576 bytes ] :     3619 objs;   2.0 MiB;  12.4 cum MiB
class  27 [      640 bytes ] :     3121 objs;   1.9 MiB;  14.3 cum MiB
class  28 [      704 bytes ] :      678 objs;   0.5 MiB;  14.8 cum MiB
class  29 [      768 bytes ] :     1927 objs;   1.4 MiB;  16.2 cum MiB
class  30 [      832 bytes ] :       93 objs;   0.1 MiB;  16.3 cum MiB
class  31 [      896 bytes ] :      625 objs;   0.5 MiB;  16.8 cum MiB
class  32 [      960 bytes ] :     3169 objs;   2.9 MiB;  19.7 cum MiB
class  33 [     1024 bytes ] :      673 objs;   0.7 MiB;  20.4 cum MiB
class  34 [     1152 bytes ] :     6216 objs;   6.8 MiB;  27.2 cum MiB
class  35 [     1280 bytes ] :      666 objs;   0.8 MiB;  28.0 cum MiB
class  36 [     1408 bytes ] :      974 objs;   1.3 MiB;  29.3 cum MiB
class  37 [     1536 bytes ] :       19 objs;   0.0 MiB;  29.3 cum MiB
class  38 [     1792 bytes ] :     4592 objs;   7.8 MiB;  37.2 cum MiB
class  39 [     2048 bytes ] :      385 objs;   0.8 MiB;  37.9 cum MiB
class  40 [     2304 bytes ] :      759 objs;   1.7 MiB;  39.6 cum MiB
class  41 [     2560 bytes ] :      615 objs;   1.5 MiB;  41.1 cum MiB
class  42 [     2816 bytes ] :      192 objs;   0.5 MiB;  41.6 cum MiB
class  43 [     3072 bytes ] :      160 objs;   0.5 MiB;  42.1 cum MiB
class  44 [     3328 bytes ] :      376 objs;   1.2 MiB;  43.3 cum MiB
class  45 [     4096 bytes ] :      249 objs;   1.0 MiB;  44.3 cum MiB
class  46 [     4608 bytes ] :      138 objs;   0.6 MiB;  44.9 cum MiB
class  47 [     5120 bytes ] :       12 objs;   0.1 MiB;  44.9 cum MiB
class  48 [     6144 bytes ] :       65 objs;   0.4 MiB;  45.3 cum MiB
class  49 [     6656 bytes ] :        9 objs;   0.1 MiB;  45.4 cum MiB
class  50 [     8192 bytes ] :      111 objs;   0.9 MiB;  46.2 cum MiB
class  51 [     9216 bytes ] :       85 objs;   0.7 MiB;  47.0 cum MiB
class  52 [    10240 bytes ] :       70 objs;   0.7 MiB;  47.7 cum MiB
class  53 [    12288 bytes ] :       64 objs;   0.8 MiB;  48.4 cum MiB
class  54 [    13312 bytes ] :       51 objs;   0.6 MiB;  49.1 cum MiB
class  55 [    16384 bytes ] :       70 objs;   1.1 MiB;  50.1 cum MiB
class  56 [    20480 bytes ] :       66 objs;   1.3 MiB;  51.4 cum MiB
class  57 [    24576 bytes ] :       86 objs;   2.0 MiB;  53.5 cum MiB
class  58 [    26624 bytes ] :      324 objs;   8.2 MiB;  61.7 cum MiB
class  59 [    32768 bytes ] :     5622 objs; 175.7 MiB; 237.4 cum MiB
class  60 [    40960 bytes ] :       91 objs;   3.6 MiB; 240.9 cum MiB
class  61 [    49152 bytes ] :       45 objs;   2.1 MiB; 243.0 cum MiB
class  62 [    57344 bytes ] :       54 objs;   3.0 MiB; 246.0 cum MiB
class  63 [    65536 bytes ] :       64 objs;   4.0 MiB; 250.0 cum MiB
class  64 [    73728 bytes ] :       44 objs;   3.1 MiB; 253.1 cum MiB
class  65 [    81920 bytes ] :       41 objs;   3.2 MiB; 256.3 cum MiB
class  66 [    90112 bytes ] :       31 objs;   2.7 MiB; 258.9 cum MiB
class  67 [    98304 bytes ] :        4 objs;   0.4 MiB; 259.3 cum MiB
class  71 [   131072 bytes ] :       14 objs;   1.8 MiB; 261.1 cum MiB
class  75 [   163840 bytes ] :        1 objs;   0.2 MiB; 261.2 cum MiB
class  82 [   221184 bytes ] :        1 objs;   0.2 MiB; 261.4 cum MiB
class  86 [   253952 bytes ] :        1 objs;   0.2 MiB; 261.7 cum MiB
------------------------------------------------
PageHeap: 18 sizes;    0.0 MiB free; 1660.0 MiB unmapped
------------------------------------------------
     1 pages *   2576 spans ~   20.1 MiB;   20.1 MiB cum; unmapped:   20.1 MiB;   20.1 MiB cum
     2 pages *   3572 spans ~   55.8 MiB;   75.9 MiB cum; unmapped:   55.8 MiB;   75.9 MiB cum
     3 pages *   3612 spans ~   84.7 MiB;  160.6 MiB cum; unmapped:   84.7 MiB;  160.6 MiB cum
     4 pages *   4149 spans ~  129.7 MiB;  290.2 MiB cum; unmapped:  129.7 MiB;  290.2 MiB cum
     5 pages *   3797 spans ~  148.3 MiB;  438.6 MiB cum; unmapped:  148.3 MiB;  438.6 MiB cum
     6 pages *   3619 spans ~  169.6 MiB;  608.2 MiB cum; unmapped:  169.6 MiB;  608.2 MiB cum
     7 pages *   2338 spans ~  127.9 MiB;  736.1 MiB cum; unmapped:  127.9 MiB;  736.1 MiB cum
     8 pages *     46 spans ~    2.9 MiB;  738.9 MiB cum; unmapped:    2.9 MiB;  738.9 MiB cum
     9 pages *     21 spans ~    1.5 MiB;  740.4 MiB cum; unmapped:    1.5 MiB;  740.4 MiB cum
    11 pages *      1 spans ~    0.1 MiB;  740.5 MiB cum; unmapped:    0.1 MiB;  740.5 MiB cum
    12 pages *      1 spans ~    0.1 MiB;  740.6 MiB cum; unmapped:    0.1 MiB;  740.6 MiB cum
    13 pages *      6 spans ~    0.6 MiB;  741.2 MiB cum; unmapped:    0.6 MiB;  741.2 MiB cum
    14 pages *     12 spans ~    1.3 MiB;  742.5 MiB cum; unmapped:    1.3 MiB;  742.5 MiB cum
    15 pages *      5 spans ~    0.6 MiB;  743.1 MiB cum; unmapped:    0.6 MiB;  743.1 MiB cum
    18 pages *      2 spans ~    0.3 MiB;  743.4 MiB cum; unmapped:    0.3 MiB;  743.4 MiB cum
    19 pages *      1 spans ~    0.1 MiB;  743.5 MiB cum; unmapped:    0.1 MiB;  743.5 MiB cum
    20 pages *      2 spans ~    0.3 MiB;  743.9 MiB cum; unmapped:    0.3 MiB;  743.9 MiB cum
    25 pages *      1 spans ~    0.2 MiB;  744.0 MiB cum; unmapped:    0.2 MiB;  744.0 MiB cum
>255   large *    517 spans ~  915.9 MiB; 1660.0 MiB cum; unmapped:  915.9 MiB; 1660.0 MiB cum

Comment by Bruce Lucas (Inactive) [ 10/Mar/16 ]

Yes, my repro was on 3.2.

Comment by Martin Bligh [ 10/Mar/16 ]

Looks like turning on aggressive decommit in tcmalloc may help with the pageheap_free_bytes part of the growth. Will run experiment to test.

Seems pretty steady with 10GB WT cache at this after a couple of hours (all numbers in MB)
Virt size ~11GB, phys size ~ 9.5GB, which correlates with heap_size and heap_size - pageheap_unmapped fine

heap_size:11437 allocated_bytes:7931 pageheap_free:1843 pageheap_unmapped:1468
central_cache:21 transfer_cache:19 thread_cache:155 thread_cache_max:1024

So there's almost 2GB of free memory in the pageheap, but nothing in the central cache.
bruce.lucas, I presume you were doing this on 3.2 ?

Comment by James Mangold [ 05/Mar/16 ]

Thanks Bruce. So... Any thoughts on some sort of fix?

-James

*******************************************************
This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful.
*******************************************************

Comment by Bruce Lucas (Inactive) [ 04/Mar/16 ]

Hi James,

I've been able to reproduce a memory growth issue similar to the issue you saw by using a simple workload that randomly grows documents.

  • standalone mongod, 40 GB cache
  • 10 threads of the following loop that randomly grows 100 k documents:

    function update(count) {
        while (true) {
            var l = Math.random()*200
            x = ''
            while (x.length < l)
                x += 'xxxxxxxxxxxx'
            var id = Math.floor(Math.random()*count)
            db.c.update({_id:id}, {$push:{x:x}}, {upsert:true})
        }
    }
     
    update(100000)
    

I believe this generates conditions for fragmentation to occur because as documents grow the freed buffers for the previous smaller documents are too small to be used to hold the new larger documents. Here is the memory growth over a couple hours run with this workload:

At point A it reached a heap size of 37.3 GB of which only 29.0 GB was allocated, or an excess of almost 30%, which presumably represents fragmented memory. This is smaller than the excess that you saw, which was about 70%, so I suspect your workload may have additional sources of fragmentation as well. However this update workload will be useful for testing future solutions to this problem, so thank you for submitting this ticket.

Bruce

Comment by James Mangold [ 01/Mar/16 ]

Thanks for responding Bruce.

Your understanding about the workload is correct. We are reading a data feed with the loaders and updating records - adding into the versions array as we get updates.

As far as the update history, we will update random records, in random order, with a random number of items in the versions array of documents. The document size can vary in that nested array. Could be 1 value, could be 100.

Comment by Bruce Lucas (Inactive) [ 01/Mar/16 ]

Hi James,

Thanks for reporting this issue. Here's what we see in the data you uploaded:

  • at B we see the crash due to the allocation failure (out of memory)
  • from A-B we see
    • virtual, resident, and heap size are growing, leading to the allocation failure
    • that increase is accounted for by an increase in free memory
    • but allocated memory remains constant
    • this coincides with a period of heavy update activity
  • after the crash and failover this node is now functioning as secondary, and appears to be seeing the same increase in free memory

This is a sign of likely fragmentation: as old memory blocks are freed, they cannot be used to satisfy new allocation requests due to memory fragmentation, so instead of reusing the freed memory blocks the allocater has to obtain more memory from the system.

We are tracking memory fragmentation issues on SERVER-20306. However this appears to be a little different scenario: the fragmentation on that ticket was caused by a shift in workload from inserts to reads that required loading data from disk into the cache; the former requires allocating a lot of small memory blocks, whereas that latter requires allocating larger blocks, so cannot reuse memory left fragmented by freeing the small blocks. However in this case we don't see data being read into the cache, but rather we see a heavy update workload.

Based on the operations logged in the mongod log, it appears that a substantial portion of the workload during this time period consists of updating documents in OMEGA_SCHEMA2.OMEGA_SCHEMA2 by pushing elements onto the VERSIONS array. I would theorize that this creates a similar kind of fragmentation - as elements are pushed onto the VERSIONS array the documents grow, and freeing the smaller versions of the documents leaves fragmented memory that can't be used to hold the larger versions.

In order to better understand fragmentation issues and to evaluate potential solutions, we would like to try to reproduce this scenario. To help us do that would you be able to answer the following?

  • Is my understanding about the primary update workload above correct? Are there other significant aspects to the workload that might be relevant?
  • What is the update history in terms of document sizes and number of updates of a typical document in that collection?
  • Are the updates made to random documents in the collection, or are the updates made to documents int the same order that they were inserted into the collection?

Thanks,
Bruce

Generated at Thu Feb 08 04:01:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.