[SERVER-27944] mongodb 3.4.1 OOM when add new node to replica set Created: 08/Feb/17  Updated: 16/Feb/17  Resolved: 16/Feb/17

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: 3.4.1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: zhangyu.liu Assignee: Mark Agarunov
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2017-02-14 at 11.23.28 AM.png     PNG File disk-mem.png     File metrics.2017-01-22T07-45-36Z-00000     File metrics.2017-02-14T02-34-24Z-00000     Zip Archive mongod.zip     File mongod_log.gz     PNG File screenshot-1.png    
Issue Links:
Duplicate
duplicates SERVER-27678 CollectionCloner should call _finishC... Closed
Operating System: ALL
Participants:

 Description   

OOM when add new node to replica set

We meet a OOM problem when we add a new node(with 16GB memory) to replica set , there are about 200GB data on primary node , it takes 1 hours before OOM . We have tried some times, but same result.

"top" command has been run on our linux system there are almost no buffer,Cache,Free memory before OOM.

We notice there are much "building index using bulk method; build may temporarily use up to 500 megabytes of RAM" in log. We found a parameter named "secondaryIndexPrefetch" to release resource for building index , but it doesn't work in WT.



 Comments   
Comment by Mark Agarunov [ 16/Feb/17 ]

Hello zhangyu.liu,

Thank you for providing the data. We have managed to reproduce the described behavior and it appears to be a symptom of the issue described in SERVER-27678. As this issue has been fixed in version 3.4.2, the best approach would be to upgrade to the latest version of MongoDB. If the issue is still present after upgrading, please let me know and we will continue investigating.

Thanks,
Mark

Comment by zhangyu.liu [ 14/Feb/17 ]

Hi Mark

Thanks for reply.

I have uploaded diagnostic.data ( heapProfilingEnabled=true ) , but it is not a natural OOM log , we manually kill the instance when there are only 200mb (free+buffers+cached) memory left , we start the testing from 12GB free memory.

If the log is not work for your team to diagnose the issue , we will run another natural OOM test later.

Comment by Mark Agarunov [ 13/Feb/17 ]

Hello zhangyu.liu,

Thank you for providing the diagnostic data. Looking over the data, there may be an issue with the heap allocations, which is causing excess memory use. To better understand this issue, we will need the data from the heap profiler. To get this data, please do the following:

  • Start mongod with the additional flag --setParameter heapProfilingEnabled=true
  • Run mongod until reaching the OOM condition
  • Upload the diagnostic.data so that we can examine the data.

Thanks,
Mark

Comment by zhangyu.liu [ 09/Feb/17 ]

Already uploaded Thank you

Comment by Daniel Pasette (Inactive) [ 08/Feb/17 ]

Hi zhangyu.liu, can you attach the diagnostic.data files found in the data directory of the mongod?

Generated at Thu Feb 08 04:16:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.