Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.2.12
Component/s: WiredTiger
Labels:
None

CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hello,

I am trying to gain insight into an ongoing issue we've been having. Some configuration info:

MongoDB 3.2.12
centos-release-6-8.el6.centos.12.3.x86_64 running in a 64GB VMware VM

Two stand-alone instances of mongod run on this server: no sharding, no replication. There are several other memory consumers of memory on this server, mostly 5 or 6 Java programs only one of which consumes any significant memory (~16GB). And this one Java program is the main application that depends on MongoDB.

What we observe is everything working OK for 2-5 days and then the oom-killer decides to kill one of the mongods. This has happened 5-6 times over the last month. Typically, only 1 mongod is killed but there was at least one occasion where both were.

Output from 'sar' shows the same pattern again and again: consumption of RAM, sometimes fairly rapid, over the 2-5 days, followed by ~61GB of RAM in use for a day or two and then the oom-killer does its thing.

I should mention that I tried to constrain the WT cache size to 12GB for each of the two mongods. This seemed to prevent the oom-killer from firing, but our application became 'unresponsive'.

I should also mention that I've read a ton of MongoDB Jiras on this issue and, while I know that 3.2.12 is getting long in the tooth, many of the improvements in WT's memory management were supposedly, though not exclusively, in 3.2.10.

As to our application's 'access patterns', it's difficult to be precise but my sense is that it combines periodic bursts of write activity along with the occasional (human-driven) reading of a very large collection. (In this regard I am familiar with the Jiras that discuss MongoDB threads turning their attention to cache eviction rather than servicing application requests - but I believe that this issue was improved in 3.2.10).

In any event, I would be very grateful for some bright light on this ongoing and very frustrating problem. At a minimum, if I could upload the WT diagnostics and someone at MongoDB could run their internal visualizer against it - along with an analysis, that would be a good start.

Thank you for your help.

Assignee:: Danny Hatcher (Inactive)
Reporter:: PMB
Participants:: Danny Hatcher, PMB
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Mar 01 2019 06:32:32 PM UTC
Updated:: May 06 2019 07:40:15 PM UTC
Resolved:: May 06 2019 07:40:15 PM UTC

Details

Description

Attachments

Activity

People

Dates