[SERVER-15049] Database is read-overloaded after upgrading to 2.6.4 Created: 27/Aug/14  Updated: 15/Nov/21  Resolved: 27/Aug/14

Status: Closed
Project: Core Server
Component/s: Performance, Querying
Affects Version/s: 2.6.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alexander Ivanes Assignee: Ramon Fernandez Marina
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04 kernel 3.13.0-32.57~precise1-generic 3.13.11.4
Intel SSD S3500


Attachments: PNG File drive2-db1_17-22.09.14.png     File mongodb-drive2-db1-1.log.final.gz     File mongodb-drive2-db1-2.log.final.gz     File mongodb-drive2-db1-3.log.final.gz    
Issue Links:
Duplicate
duplicates SERVER-14961 Plan ranker favors intersection plans... Closed
Operating System: Linux
Participants:

 Description   

Hello, everyone!
We have a standard replica set configuration: primary + 2 secondaries.
After upgrading from 2.6.3 to 2.6.4 strange things happen: new primary gets read-overloaded immediatly after rs.stepdown() from the previous one.
We have solved the problem temporarily using another rs.stepdown(), but after several hours she's returned. It happened several times, until we downgraded mongodb to 2.6.3.

Additional information:
Here's one of the problem queries:

{ 
	count: "tokens", 
	query: { _id: -2648398055870562304, $or: [ { t_apn: { $exists: true, $ne: [] } }, { t_gcm: { $exists: true, $ne: [] } } ] }, 
	limit: 1 
}

currentOp:

{
       "opid" : 538972133,
       "active" : true,
       "secs_running" : 12,
       "microsecs_running" : NumberLong(12730587),
       "op" : "query",
       "ns" : "d2_apn.tokens",
       "query" : {
               "count" : "tokens",
               "query" : {
                       "_id" : NumberLong("-5091952697439944704"),
                       "$or" : [
                               {
                                       "t_apn" : {
                                               "$exists" : true,
                                               "$ne" : [ ]
                                       }
                               },
                               {
                                       "t_gcm" : {
                                               "$exists" : true,
                                               "$ne" : [ ]
                                       }
                               }
                       ]
               },
               "limit" : 1
       },
       "planSummary" : "IXSCAN { t_apn: 1 }, IXSCAN { t_gcm: 1 }, IXSCAN { _id: 1 }",
       "client" : "10.3.1.143:63290",
       "desc" : "conn1538217",
       "threadId" : "0x7f5e626b5700",
       "connectionId" : 1538217,
       "locks" : {
               "^" : "r",
               "^d2_apn" : "R"
       },
       "waitingForLock" : false,
       "numYields" : 537,
       "lockStats" : {
               "timeLockedMicros" : {
                       "r" : NumberLong(13578592),
                       "w" : NumberLong(0)
               },
               "timeAcquiringMicros" : {
                       "r" : NumberLong(989936),
                       "w" : NumberLong(0)
               }
       }
}

dbtop:

NS             | totalReads  | Writes | Queries | GetMores | Inserts | Updates | Removes
d2_apn.tokens  | 128 9305.4% | 128 9305.4% | 0 0% | 0 0% | 0 0% | 0 0% | 0 0% | 0 0%

explain:

db.tokens.find({ _id: -2648398055870562304, $or: [ { t_apn: { $exists: true, $ne: [] } }, { t_gcm: { $exists: true, $ne: [] } } ] }).limit(1).explain()
{
	"cursor" : "BtreeCursor _id_",
	"isMultiKey" : false,
	"n" : 0,
	"nscannedObjects" : 1,
	"nscanned" : 1,
	"nscannedObjectsAllPlans" : 2,
	"nscannedAllPlans" : 17,
	"scanAndOrder" : false,
 	"indexOnly" : false,
 	"nYields" : 0,
 	"nChunkSkips" : 0,
 	"millis" : 1,
 	"indexBounds" : {
 	"_id" : [
 		[
 			-2648398055870562300,
 			-2648398055870562300
 		]
 	]
 	},
 	"server" : "drive2-db1-1:27017",
 	"filterSet" : false
}

getIndexes:

[
	{
 		"v" : 1,
 		"key" : {
 			"_id" : 1
 		},
 		"ns" : "d2_apn.tokens",
 		"name" : "_id_"
 	},
 	{
 		"v" : 1,
 		"key" : {
 			"t_apn" : 1
 		},
 		"ns" : "d2_apn.tokens",
 		"name" : "t_apn_1"
 	},
 	{
 		"v" : 1,
 		"key" : {
 		"t_gcm" : 1
 		},
 		"ns" : "d2_apn.tokens",
 		"name" : "t_gcm_1"
 	}
]

stats:

{
       "ns" : "d2_apn.tokens",
       "count" : 94291,
       "size" : 11569160,
       "avgObjSize" : 122,
       "storageSize" : 22507520,
       "numExtents" : 7,
       "nindexes" : 3,
       "lastExtentSize" : 11325440,
       "paddingFactor" : 1.027000000000001,
       "systemFlags" : 1,
       "userFlags" : 0,
       "totalIndexSize" : 19499760,
       "indexSizes" : {
               "_id_" : 3875424,
               "t_apn_1" : 6197408,
               "t_gcm_1" : 9426928
       },
       "ok" : 1
}

I've attached some charts from replica set: DB wait times, cpu utilization and load, and disk utilization.
Important: We've updated mongoDB to version 2.6.4 in ~13-00 on august 18. First rs.stepdown was in ~16-00 and then strange spikes came back in ~08-00, august 20 and in ~10-00 august 21, each with manual rs.stepdown to solved it.

If you want some additional info and charts – just ask.
Thanks.



 Comments   
Comment by Alexander Ivanes [ 27/Aug/14 ]

Thanks, Ramón.
If we will suffer with this tomorrow then i'll let you know.

Comment by Ramon Fernandez Marina [ 27/Aug/14 ]

vergil, we're going to close this ticket as a duplicate of SERVER-14961. However if you're still experiencing problems with these queries in 2.6.4 after disabling index intersection please feel free to re-open this ticket.

Regards,
Ramón.

Comment by Alexander Ivanes [ 27/Aug/14 ]

I've upgraded all nodes to 2.6.4. Before i turned off IndexIntersection there was a spike again, after – nothing changed, same as always. Then i turned off it on all remaining nodes and did StepDown on the primary.

Comment by Ramon Fernandez Marina [ 27/Aug/14 ]

Thanks for uploading the logs vergil. A colleague points out that this is a likely manifestation of SERVER-14961, in which case turning off index intersection would fix the problem:

> db.adminCommand({setParameter: 1, internalQueryPlannerEnableIndexIntersection: false});
{ "was" : true, "ok" : 1 }

Is it possible for you to upgrade to 2.6.4 and disable index intersection to see if the problem disappears?

Thanks,
Ramón.

Comment by Alexander Ivanes [ 27/Aug/14 ]

By the way, all nodes running as a guest systems on the Hyper-V Server, but i don't think that this is the problem.

Comment by Alexander Ivanes [ 27/Aug/14 ]

I'm sorry for a delay. I've just attached log files from replica set.
All three nodes running the same hardware configuration, yes.

Comment by Ramon Fernandez Marina [ 27/Aug/14 ]

vergil, can you please post logs for your three nodes since you upgraded to 2.6.4? Also, are all your three nodes running the same hardware configuration?

Generated at Thu Feb 08 03:36:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.