[SERVER-24076] Excess memory use on PPC64 Created: 06/May/16  Updated: 06/Jun/16  Resolved: 03/Jun/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.3.5
Fix Version/s: 3.3.8

Type: Bug Priority: Critical - P2
Reporter: Lilian Romero Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2016-05-19 at 6.54.57 pm.png     File diagnostic.data.tar.gz     File diagnostic.data_3.2.6.tar.gz     File mongod.out.gz     File mongostat.out    
Issue Links:
Related
related to SERVER-24207 Substantial excess allocated memory a... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

version 3.3.5 for ppc64le downloaded from the website
OS: RHEL7.2 , Memory: 128GB
Workload: YCSB
Used the following:
fieldcount=10
fieldlength=10
recordcount=120000000
operationcount=100000000

Participants:

 Description   

Mongod crashes while running YCSB. Log and mongstat are attached.
The system keeps on consuming memory until it starts paging.
At the end of the log it shows:

2016-05-03T19:52:39.604-0400 E STORAGE  [conn35] WiredTiger (0) [1462319559:601568][62565:0x3fff98aeeb50], file:index-3-2753723832870039369.wt, WT_CURSOR.search: read checksum error for 12288B block at offset 3836055552: block header checksum of 892942393 doesn't match expected checksum of 2861721176
2016-05-03T19:52:39.604-0400 E STORAGE  [conn35] WiredTiger (0) [1462319559:604949][62565:0x3fff98aeeb50], file:index-3-2753723832870039369.wt, WT_CURSOR.search: index-3-2753723832870039369.wt: encountered an illegal file format or internal value
2016-05-03T19:52:39.604-0400 E STORAGE  [conn35] WiredTiger (-31804) [1462319559:604971][62565:0x3fff98aeeb50], file:index-3-2753723832870039369.wt, WT_CURSOR.search: the process must exit and restart: WT_PANIC: WiredTiger library panic



 Comments   
Comment by Michael Cahill (Inactive) [ 03/Jun/16 ]

lilianr@us.ibm.com, in my testing, this change reduces the tcmalloc overhead on PPC by over 50%. Please let me know if this does not resolve the issue for you.

Comment by Githook User [ 03/Jun/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-24076 Reduce tcmalloc overhead with >4KB pages.
Branch: master
https://github.com/mongodb/mongo/commit/87010b47738fe11e1395f913bfffa9e787014c96

Comment by Michael Cahill (Inactive) [ 27/May/16 ]

lilianr@us.ibm.com, I have created a patch build, the binaries are here:

https://s3.amazonaws.com/mciuploads/mongodb-mongo-master/enterprise-rhel-62-64-bit/cc6ee9cf116853289ee41784220b3ce8ed14c29c/binaries/mongo-mongodb_mongo_master_enterprise_rhel_62_64_bit_cc6ee9cf116853289ee41784220b3ce8ed14c29c_16_05_26_06_54_11.tgz

Please re-run tests with these binaries: if they avoid the excess memory use then these changes will be in a future release of MongoDB.

Comment by Lilian Romero [ 26/May/16 ]

Can you provide a fix for a non-release build? We are currently using 3.3.6.

Comment by Alexander Gorrod [ 20/May/16 ]

Thanks lilianr@us.ibm.com We will keep you updated on the status. The patch we tested will need to go through code review and testing before it makes it into a release. Let me know if the problem is blocking you from making progress in the mean time and you are interested in getting a non-release build to test with.

Comment by Lilian Romero [ 20/May/16 ]

The error that I was able to reproduce was the memory issue. I have not been able to re-create the data corruption again. Let me know in which build the memory issue is resolved.
For now, lets defer the data corruption issue until we are able to reproduce it again. If reproducible then I will leave the system as is so we can debug it.

Comment by Alexander Gorrod [ 19/May/16 ]

At the beginning of the workload phase the diagnostic data shows that the allocated memory grows larger than the configured cache size.

We have figured out that this problem is specific to PPC, and is due to some changes we have made to optimize tcmalloc in the development version of MongoDB, that we have not yet fully tested.

The technical problem is that we divide memory allocations into two different categories: allocations less than 16k and allocations greater than 16k. Allocations greater than 16k are done using a different allocation scheme. On PPC platforms, the minimum allocation size for allocations in the second allocation scheme is 64k to match the OS virtual memory page size. During the workload phase WiredTiger does a lot of allocations in the 28k->32k range, which will result an overhead of 50% in the system allocator.

We have tested a change to our allocator that reduces the overhead for larger allocations on PPC. The following graph demonstrates the behavior before (first half) and after (second half) our change. You can see that "tcmalloc allocated minus wt cache" remains stable in the second run, wheras it jumps up during the workload phase of the first run.

Comment by Alexander Gorrod [ 19/May/16 ]

lilianr@us.ibm.com Sorry for the radio silence on this one - we have been digging deeper. We have identified three different interesting characteristics from this ticket so far:

1) At the beginning of the workload phase the diagnostic data shows that the allocated memory grows larger than the configured cache size.
2) We see that there is memory overhead being held in the memory allocator.
3) The WT_PANIC you initially reported. We have not been able to make progress on that one, since it does not appear to be reproducible.

We understand the first two problems, and I'll address them in separate comments below. For the third problem - do you have any ideas how we can proceed? I'm not sure how to make progress.

Comment by Alexander Gorrod [ 19/May/16 ]

tcmalloc 64k page allocation overhead

Comment by Alexander Gorrod [ 13/May/16 ]

Thanks for attaching the additional data lilianr@us.ibm.com. I've taken a quick look and can see some differences - I'll dig deeper into analysing the data over the next few days and update the ticket when I have useful information.

I have attempted to reproduce the behavior you reported on a non-PPC machine, and have not had any luck. Could you share some more information about the YCSB setup you are using? Especially:

  • Which version of the YCSB benchmark you are running
  • How many threads you are configuring for the load and run
  • Whether you are using an explicit requestdistribution setting for YCSB, or the default uniform

Thanks for your help in tracking this problem down.

Alex

Comment by Lilian Romero [ 13/May/16 ]

The attachment contains the diagnostic data for 3.2.6

Comment by Lilian Romero [ 11/May/16 ]

To clarify, the problem that is easy to reproduce is the excess use of memory compared to 3.2.6. Have not seen the checksum mismatch in 3.2.6. I will post diagnostics data for 3.2.6 by tomorrow. Will re-run with 3.3.5 to find out if the checksum mismatch can be reproduced.

Comment by Michael Cahill (Inactive) [ 11/May/16 ]

lilianr@us.ibm.com, I will try to reproduce on a test system today and let you know what I find. Please note that I am out of the office at meetings this week so I may not be able to respond immediately to updates.

I can see from the data you have uploaded that the excess memory use is a combination of memory allocator fragmentation (44GB) and memory allocated outside the WiredTiger cache (39GB). The former is unfortunate but we understand it and have some workarounds, the latter is unexpected and I will try to understand what is causing it.

If you can post diagnostic data from a 3.2.6 run, that would be very helpful. Can you also clarify whether the checksum mismatch is occurring with 3.2.6?

Comment by Lilian Romero [ 06/May/16 ]

attaching diagnostic data.
I tried the same test using mongodb3.2.6 and don't run into the problem where it keeps on consuming memory. So there may be 2 issues: 1) memory leak and 2) data corruption. Let me know if you need the mongostat from3.2.6 to compare.
The disk subsystem is not reporting any errors.

Comment by Alexander Gorrod [ 06/May/16 ]

lilianr@us.ibm.com I'm sorry to hear that you have encountered a problem using MongoDB.

The most common cause for a checksum error is a disk corruption; are you running this test using a reliable disk subsystem? Does the failure reproduce reliably?

You mention that The system keeps on consuming memory until it starts paging. That is unfortunate - I see that mongod is using a WiredTiger cache size of 60GB. Do you have either multiple mongod instances running on the same machine, or other processes running on the same machine that are consuming a lot of memory as well?

It would be very useful if you can also upload the content of the diagnostic.data directory that should be being created as a subdirectory of the MongoDB database directory.

One more request: Could you try running the validate command on the collections in the database, and report back the status of running that command?

Generated at Thu Feb 08 04:05:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.