Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.4
Component/s: Stability
Labels:
None

Operating System:
ALL
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We have a pretty sweet setup that has been running nicely for 2 years+
3 shards - 2 replica sets each + 1 arbiter

all 6 replica sets, but 3 the arbiters are running on 240 GB machines in AWS (r3.8xlarge)

Each replica set has 10 disks in raid 0 on standard EBS, 50 GB each. We are at the limit of the 500GB (10x50 GB) and we are looking to increase this by using the new disks amazon is providing - SSD with guaranteed IOPS over EBS. Still RAID 0 with 5x200GB disks.

On shard 1 I have added another replica set (exact same server - 240GB RAM/32 CPU), to increase disk size a bit (as explained above).

Here is a db.stats from the main database on shard1:

{
	"db" : "xxxxxxxxx",
	"collections" : 152,
	"objects" : 487562429,
	"avgObjSize" : 272.9380673915709,
	"dataSize" : 133074347104,
	"storageSize" : 149098192336,
	"numExtents" : 1041,
	"indexes" : 288,
	"indexSize" : 63420994896,
	"fileSize" : 238188232704,
	"nsSizeMB" : 16,
	"dataFileVersion" : {
		"major" : 4,
		"minor" : 5
	},
	"extentFreeList" : {
		"num" : 90,
		"totalSize" : 11087396848
	},
	"ok" : 1
}

What we noticed after the sync:

If we make the new server primary, the website is still semi responsive, but the heavy pages die. Looking at the database, it starts using 3200% cpu - all cpus at 100% and memory usage starts increasing by roughly 1% every 5 seconds or so. I am talking about memory assigned to mongod, not the cached memory in the kernel - which sits nicely around 69GB.

I noticed very slow 90 seconds queries scanning through millions of records, although we have indexes for those complex queries which work fine on the other replica set member (previous primary). Seems the indexes are not used. I assumed the indexes are not in memory, so I touched them on that collections (both index and data) - http://docs.mongodb.org/manual/reference/command/touch/ to no avail.

I also assumed it might be using so much memory due to complex sorts, but that memory should be freed asap, so I tried changing the allocator to jemalloc. I am under impression things are a bit better, but I am not 100% sure.
In the end if I do not switch the primary back, the database ends up using all the memory and the kernel will kill it as seen here: http://pastebin.com/teMT5eqv

Please let me know if any other information might throw some light on things.

Also, I have posted this on the google group and someone pointed out my version (2.6.4) coming out recently. I have since reverted to 2.6.3 and everything is going nicely. I did not try to upgrade all servers to 2.6.4, so this might still be related to different versions running in the same replica set, but I think they should be compatible 2.6.3 with 2.6.4. It was a minor, not a major update.

Assignee:: J Rassi (Inactive)
Reporter:: Mircea Danila Dumitrescu
Participants:: Alexandre Bini, Daniel Pasette, Dmitry, J Rassi, Mircea Danila Dumitrescu, Ofer Cohen, Ramon Fernandez Marina
Votes:: 6 Vote for this issue
Watchers:: 19 Start watching this issue

Created:: Aug 20 2014 12:01:37 PM UTC
Updated:: May 20 2015 01:23:23 AM UTC
Resolved:: May 20 2015 01:23:23 AM UTC

Details

Description

Attachments

Activity

People

Dates