[SERVER-23362] Memory leak symptom in wiredtiger node with 1GB of memory Created: 26/Mar/16  Updated: 02/May/16  Resolved: 29/Mar/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tung Nguyen Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File memory-leak.png    
Issue Links:
Related
related to SERVER-23391 Improve behavior when running under W... Closed
Operating System: ALL
Participants:

 Description   

Hi there,

I noticed symptom which looks like memory leak in config servers of a sharded cluster (see atachment).

If I restart mongod process of these config servers the memory utilization will go low, and then eventually go up until the process gets crashed or restarted manually.

  • MongoDB version: 3.2.4
  • Shards: 2 (each shard is a 3-member replica set)
  • Config server: 3-member replica set

Could you please advise?

Thanks.



 Comments   
Comment by Daniel Pasette (Inactive) [ 02/May/16 ]

Hi Jiri,
The following should work. In v3.4, mongod will support fractional wiredTigerCacheSizeGB values (SERVER-23624).

storage:
   wiredTiger:
      engineConfig:
         configString: 'cache_size=200M'

Comment by Jiri Nemecek [ 02/May/16 ]

Hello Ramon,

can the --wiredTigerEngineConfigString="cache_size=200M" command line parameter be somehow put into the mongo conf file? We tried the storage.wiredTiger.engineConfig.cacheSizeGB as fractional number but that does not work as it is expecting an integer.

Kind regards,
Jiri

Comment by Tung Nguyen [ 29/Mar/16 ]

Thank you all for your responses.

Comment by Ramon Fernandez Marina [ 29/Mar/16 ]

tung@misfit.com, you're using machines with 1GB of memory, which is the minimum cache size WiredTiger will use. From the logs:

2016-01-05T07:34:04.928+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G...

So the memory usage you're seeing is not a leak, but just increased cache usage until the server gets killed by the OOM killer. Please consider using machines with more memory; if that's not an option, you can consider lowering the cache size using the following command-line switch:

--wiredTigerEngineConfigString="cache_size=200M"

The above will configure the cache to be 200 megabytes.

Since this is not a bug in the server, and the SERVER project is for reporting bugs or feature suggestions for the MongoDB server, I'm going to close this ticket. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources.

Regards,
Ramón.

Comment by Tung Nguyen [ 29/Mar/16 ]

Hi Bruce,

My colleague found this on another config server: mongod process might be restarted when number of connections exceeds 60. I am not sure why it is 60, maybe it is because of the limited hardware (we are running t2.micro EC2 instances), or some other reason.

And I found that not closing DB connection is actually a good practice so I do not think we have problem with connections staying there. But please help confirm if this is normal behavior?

Anyway, this is the log of connections made, and at around 60, the number of connections went back to 1, 2, 3, ...

2016-03-15T11:12:11.251+0000 I NETWORK initandlisten connection accepted from 10.0.1.91:46221 #553 (61 connections now open)
2016-03-15T11:12:11.427+0000 I NETWORK initandlisten connection accepted from 10.0.1.91:46225 #554 (62 connections now open)
2016-03-15T11:12:11.899+0000 I NETWORK initandlisten connection accepted from 10.0.1.91:46231 #555 (63 connections now open)
2016-03-15T11:20:16.097+0000 I NETWORK initandlisten connection accepted from 10.0.2.11:59382 #556 (60 connections now open)
2016-03-15T11:20:16.239+0000 I NETWORK initandlisten connection accepted from 10.0.2.11:59384 #557 (61 connections now open)
2016-03-15T11:20:16.286+0000 I NETWORK initandlisten connection accepted from 10.0.2.11:59385 #558 (62 connections now open)
2016-03-15T11:50:09.405+0000 I NETWORK initandlisten connection accepted from 10.0.1.91:46340 #559 (60 connections now open)
2016-03-15T11:50:09.523+0000 I NETWORK initandlisten connection accepted from 10.0.1.91:46342 #560 (61 connections now open)
2016-03-15T11:50:09.524+0000 I NETWORK initandlisten connection accepted from 10.0.1.91:46343 #561 (62 connections now open)
2016-03-15T13:27:58.835+0000 I NETWORK initandlisten connection accepted from 10.0.2.11:59608 #562 (60 connections now open)
2016-03-15T13:27:58.912+0000 I NETWORK initandlisten connection accepted from 10.0.2.11:59610 #563 (61 connections now open)
2016-03-15T13:27:58.912+0000 I NETWORK initandlisten connection accepted from 10.0.2.11:59611 #564 (62 connections now open)
2016-03-15T13:27:58.987+0000 I NETWORK initandlisten connection accepted from 10.0.2.11:59612 #565 (63 connections now open)
2016-03-15T14:17:07.278+0000 I NETWORK initandlisten connection accepted from 10.0.1.49:41369 #566 (61 connections now open)
2016-03-15T14:17:07.359+0000 I NETWORK initandlisten connection accepted from 10.0.1.49:41370 #567 (62 connections now open)
2016-03-15T14:17:07.582+0000 I NETWORK initandlisten connection accepted from 10.0.1.49:41373 #568 (63 connections now open)
2016-03-15T15:31:37.667+0000 I NETWORK initandlisten connection accepted from 10.0.1.49:41961 #569 (60 connections now open)
2016-03-16T00:04:28.977+0000 I NETWORK initandlisten connection accepted from 10.0.2.23:41755 #1 (1 connection now open)
2016-03-16T00:04:29.378+0000 I NETWORK initandlisten connection accepted from 10.0.2.71:48007 #2 (2 connections now open)
2016-03-16T00:04:29.452+0000 I NETWORK initandlisten connection accepted from 10.0.2.110:45560 #3 (3 connections now open)
2016-03-16T00:04:29.686+0000 I NETWORK initandlisten connection accepted from 10.0.2.109:49577 #4 (4 connections now open)
2016-03-16T00:04:30.158+0000 I NETWORK initandlisten connection accepted from 10.0.2.72:45842 #5 (5 connections now open)
2016-03-16T00:04:32.579+0000 I NETWORK initandlisten connection accepted from 127.0.0.1:59327 #6 (6 connections now open)
2016-03-16T00:04:34.068+0000 I NETWORK initandlisten connection accepted from 10.0.0.110:33449 #7 (7 connections now open)
2016-03-16T00:04:34.094+0000 I NETWORK initandlisten connection accepted from 10.0.2.191:50854 #8 (8 connections now open)
2016-03-16T00:04:35.106+0000 I NETWORK initandlisten connection accepted from 10.0.1.118:54083 #9 (9 connections now open)
2016-03-16T00:04:35.106+0000 I NETWORK initandlisten connection accepted from 10.0.2.108:56057 #10 (10 connections now open)
2016-03-16T00:04:35.146+0000 I NETWORK initandlisten connection accepted from 10.0.1.172:58923 #11 (11 connections now open)

And from MongoDB Cloud Manager, we got this, which pretty much matches the time of logs above:

Host is down 03/16/16 - 04:12:10 AM config3-fig-stg.misfit:27017 in fig-config

Thanks.

Comment by Tung Nguyen [ 28/Mar/16 ]

Thanks Bruce for getting back! Please check new attachments.

Comment by Bruce Lucas (Inactive) [ 28/Mar/16 ]

Hi Tung,

Can you please archive (tar or zip) the $dbpath/diagnostic.data directory of a config server that experienced this problem, and attach to this ticket, together with the mongod log? This will contained detailed statistics about past memory usage that will help us understand the problem.

Thanks,
Bruce

Generated at Thu Feb 08 04:03:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.