[SERVER-24303] Enable tcmalloc aggressive decommit by default Created: 27/May/16  Updated: 26/Aug/16  Resolved: 19/Aug/16

Status: Closed
Project: Core Server
Component/s: Build, Storage, WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Michael Cahill (Inactive) Assignee: Michael Cahill (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-24019 Eviction failure because of hazard po... Closed
Related
is related to SERVER-22906 MongoD uses excessive memory over and... Closed
Backwards Compatibility: Fully Compatible
Participants:

 Description   

In SERVER-22906 (among others), we recommend turning on tcmalloc's aggressive decommit feature, which we have disabled by default (overriding current tcmalloc's default).

There are performance test regressions, we need to understand them and figure out if anything can be done to reduce the impact. Then we need to make a cost/benefit decision.



 Comments   
Comment by Michael Cahill (Inactive) [ 19/Aug/16 ]

Work in SERVER-20306, SERVER-22906 and WT-2665 has significantly reduced the amount of excess memory that accumulates in tcmalloc in MongoDB 3.4. In addition, alexander.gorrod's testing demonstrates that there is a non-trivial performance impact of enabling aggressive decommit.

Given that, we have decided not to make this change for MongoDB 3.4. There is still the option of manually enabling aggressive decommit via an environment variable if we find cases in the field where it would help.

Comment by Alexander Gorrod [ 18/Aug/16 ]

I've done performance analysis comparing MongoDB with TCMalloc aggressive decommit enabled vs disabled. I have run tests both locally and via Evergreen.

I have run the YCSB workloads that are configured in Evergreen locally for comparison. The table below summarizes the results, all numbers are in overall throughput (ops/sec):

Phase Enabled Enabled Disabled Disabled % slower with enabled
load 23009 22445 24975 24375 8%
100% read 47905 47452 54373 54559 12%
95% read, 5% update 37482 37941 42855 42138 11%
50% read, 50% update 16004 14825 15167 15922 0%

I have also run the Evergreen performance tests with aggressive decommit enabled. There has been quite a lot of performance fluctuation recently, so it is difficult to get an accurate measure for the difference. There are at least some tests that show significant performance regressions with aggressive decommit enabled.

The micro-benchmarks (MongoDB Perf suite) shows results that are mostly within standard variation except:

  • Insert with large documents has a 25-30% degradation (this test is flagged as passing with up to a 20% variation).
  • Misc DistinctWithIndex has a 20% degradation (this test is flagged as passing with up to a 15% variation).
  • singleThreaded tests that update large documents has up to 45% degradation (this test is flagged as passing with up to 10% variation).

The system benchmarks (sys-perf suite) results that are outside standard variation are:

  • Insert TTL and Vector workloads up to 70% degradation (this test is flagged as passing with up to 12% variation).
  • YCSB all workloads show up to a 70% degradation (note that this is much more significant than my local testing).

The above results are generally reproducible via Evergreen patch builds. Seeing a performance degradation for workloads that allocate large documents was expected, but I had thought it would be in the 10-15% range. The performance degradation in the YCSB tests is larger than is acceptable.

Given that the performance degradation is so significant and that the amount of fragmentation that can be generated has already been limited by architectural changes in WiredTiger (see SERVER-22906), I am inclined to leave aggressive decommit disabled by default for now. If users report seeing large amounts of memory reported in page heap free via serverStatus, we can recommend that they try enabling aggressive decommit via an environment variable.

Generated at Thu Feb 08 04:05:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.