[SERVER-18026] possible leak memory of mongos Created: 14/Apr/15  Updated: 22/May/15  Resolved: 22/May/15

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: 2.4.11, 2.4.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jiangcheng Wu Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Each mongos eat about more than 1.5GB RSS memory after running for about 2 months of all instances. It possible exist memory leak.

# 2.4.11
mongodb  26249  5.1 24.6 3945632 3029044 ?     Ssl  Feb06 5018:09 /usr/bin/mongos --config /etc/mongos.conf
# 2.4.11
mongodb   4418  6.5 25.6 4064288 3155864 ?     Ssl  Feb06 6316:31 /usr/bin/mongos --config /etc/mongos.conf
# 2.4.12
mongodb   7360  6.2 13.1 2501396 1621196 ?     Ssl  Feb06 6078:39 /usr/bin/mongos --config /etc/mongos.conf
# 2.4.12
mongodb  14184  6.6 15.8 2807588 1953808 ?     Ssl  Feb06 6464:35 /usr/bin/mongos --config /etc/mongos.conf

and this one eat 500MB RSS memory after running for about 10 days

# 2.4.12
mongodb  29773  5.0  4.3 1385172 539880 ?      Ssl  Apr05 653:06 /usr/bin/mongos --config /etc/mongos.conf

and some other info

# 2.4.11
x@y1:~$ mongos --version
MongoS version 2.4.11 starting: pid=23921 port=27017 64-bit host=y1 (--help for usage)
git version: fa13d1ee8da0f112f588570b4070f73d7af2f7fd
build sys info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
x@y1:~$  ldd /usr/bin/mongos
	linux-vdso.so.1 =>  (0x00007ffffb7ff000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0e509f7000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0e507ef000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0e504ee000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0e501f2000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0e4ffdc000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0e4fc1c000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f0e50c1f000)
 
#2.4.12
x@y2:~$ mongos --version
MongoS version 2.4.12 starting: pid=3914 port=27017 64-bit host=y2 (--help for usage)
git version: 09917767b116f4ff1c0eadda1e8bc5db30828500
build sys info: Linux ip-10-142-184-243 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
x@y2:~$ ldd /usr/bin/mongos
	linux-vdso.so.1 =>  (0x00007fffa97f2000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd587205000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fd586ffd000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd586cfc000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd586a00000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd5867ea000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd58642a000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd58742c000)



 Comments   
Comment by Ramon Fernandez Marina [ 22/May/15 ]

Thanks for the update wujiangcheng. I just wanted to bring another datapoint that may be relevant, and is SERVER-16683: if you're using tags there's a known issue with mongos having a large memory footprint. This issue is fixed in 2.6.7, so if you're affected you may want to consider an upgrade.

Another possible explanation for the memory consumption you're seeing is if your collections have a large number of chunks, as mongos need to keep chunk information which consumes memory.

That being said, the most common culprit is open cursors, so now that you've lowered the idle cursor timeout you should see a lower memory footprint.

Since we couldn't find evidence of a bug in the server or mongos I'm going to resolve this ticket, as we keep the SERVER project for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group.

Regards,
Ramón.

Comment by Jiangcheng Wu [ 12/May/15 ]

Hi, We have upgraded our mongos to 2.4.14 and will see if it's ok

  • We have already using MMS and it's great, and mms group id is 54210a43e4b019a9c3137b52 , hope it's useful.
  • We have more than 20k databases in the clusters, db.serverStatus() output too much info. Here are some frags of mongos( 2.4.14 ) and mongod( PRIMARY role of one SHARD)
    mongos:

    {
    	"host" : "y1:27020",
    	"version" : "2.4.14",
    	"process" : "mongos",
    	"pid" : 19494,
    	"uptime" : 42754,
    	"uptimeMillis" : NumberLong(42754909),
    	"uptimeEstimate" : 42393,
    	"localTime" : ISODate("2015-05-12T07:43:19.938Z"),
    	"asserts" : {
    		"regular" : 0,
    		"warning" : 0,
    		"msg" : 0,
    		"user" : 0,
    		"rollovers" : 0
    	},
    	"connections" : {
    		"current" : 657,
    		"available" : 19343,
    		"totalCreated" : NumberLong(999)
    	},
    	"extra_info" : {
    		"note" : "fields vary by platform",
    		"heap_usage_bytes" : 65388224,
    		"page_faults" : 0
    	},
    	"network" : {
    		"bytesIn" : 3439580900,
    		"bytesOut" : 9573157331,
    		"numRequests" : 8227650
    	},
    	"opcounters" : {
    		"insert" : 1063924,
    		"query" : 3413411,
    		"update" : 715332,
    		"delete" : 128313,
    		"getmore" : 8,
    		"command" : 2872115
    	},
    	"mem" : {
    		"bits" : 64,
    		"resident" : 239,
    		"virtual" : 1070,
    		"supported" : true
    	},
    	"metrics" : {
    		"getLastError" : {
    			"wtime" : {
    				"num" : 0,
    				"totalMillis" : 0
    			}
    		}
    	},
    	"ok" : 1
    }
    

mongod:

{
  ...
        "cursors":{
              	"totalOpen" : 15,
              	"clientCursors_size" : 15,
	       "timedOut" : 112896,
	       "totalNoTimeout" : 9
        },
        "repl" : {
		"setName" : "SHARD02",
		"ismaster" : true,
		"secondary" : false,
		"hosts" : [
			"chat7.avoscloud.com:27018",
			"rtm21.avoscloud.com:27018"
		],
		"arbiters" : [
			"mgcfg1.avoscloud.com:27018"
		],
		"primary" : "chat7.avoscloud.com:27018",
		"me" : "chat7.avoscloud.com:27018"
	},
	"writeBacksQueued" : false,
	"mem" : {
		"bits" : 64,
		"resident" : 15315,
		"virtual" : 1243365,
		"supported" : true,
		"mapped" : 612877,
		"mappedWithJournal" : 1225754
	},
	"metrics" : {
		"document" : {
			"deleted" : NumberLong(74086122),
			"inserted" : NumberLong(370899955),
			"returned" : NumberLong("24218581516"),
			"updated" : NumberLong(1151750695)
		},
		"getLastError" : {
			"wtime" : {
				"num" : 643917,
				"totalMillis" : 824
			},
			"wtimeouts" : NumberLong(0)
		},
		"operation" : {
			"fastmod" : NumberLong(38576883),
			"idhack" : NumberLong("4232024497"),
			"scanAndOrder" : NumberLong(56053961)
		},
		"queryExecutor" : {
			"scanned" : NumberLong("808472445827")
		},
		"record" : {
			"moves" : NumberLong(28233574)
		},
		"repl" : {
			"apply" : {
				"batches" : {
					"num" : 20214265,
					"totalMillis" : 1984136
				},
				"ops" : NumberLong(26850820)
			},
			"buffer" : {
				"count" : NumberLong(0),
				"maxSizeBytes" : 268435456,
				"sizeBytes" : NumberLong(0)
			},
			"network" : {
				"bytes" : NumberLong("23501981601"),
				"getmores" : {
					"num" : 20565761,
					"totalMillis" : 497241424
				},
				"ops" : NumberLong(26850822),
				"readersCreated" : NumberLong(2910)
			},
			"oplog" : {
				"insert" : {
					"num" : 1657805349,
					"totalMillis" : 69507531
				},
				"insertBytes" : NumberLong("1292072725234")
			},
			"preload" : {
				"docs" : {
					"num" : 19941312,
					"totalMillis" : 3361257
				},
				"indexes" : {
					"num" : 189620633,
					"totalMillis" : 2149245
				}
			}
		},
		"ttl" : {
			"deletedDocuments" : NumberLong(0),
			"passes" : NumberLong(328905)
		}
	},
	"ok" : 1

  • We do not using map reduce operations.
  • We have set

     timeout 

    to 10 seconds.

Comment by Sam Kleinman (Inactive) [ 16/Apr/15 ]

We've looked over this case a bit more, and I wanted to provide a bit more context as to our thinking. In addition to memory leaks, the kind of behavior that you're seeing can also result from situations where you have too many open cursors, or where cursors are not correctly timing out.

I think we can do without logs for now, but the following data may help a bit more:

  • If you could add these hosts to MMS we can use this data to review resource use over time. This would be helpful, but is optional.
  • Can you post the output of db.serverStatus() from the mongod and mongos instances?
  • Does your workload include map reduce operations in a sharded environment?
  • Does your application use cursors opened with the notimeout option (docs)?

Hopefully we can quickly isolate cursor use and figure out what's going on here. Thanks and sorry about any confusion.

Regards,
sam

Comment by Sam Kleinman (Inactive) [ 14/Apr/15 ]

Thanks for the report. I have a few additional questions, that will help us narrow down the cause of this issue:

  • Can you provide some logs from the mongos and mongod instances while the memory use is expanding?
  • Can you describe, the application, particularly it's concurrency patterns (how many instances, how many connections) as well as the query and update patterns?
  • Can you provide the output of db.currentOp() while connected to the mongos and a mongod?

Thanks, and I hope that we can help you resolve this promptly.

Cheers,
sam

Generated at Thu Feb 08 03:46:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.