[SERVER-18148] query picking wrong index? Created: 21/Apr/15  Updated: 14/May/15  Resolved: 13/May/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Paul Nock Assignee: David Storch
Resolution: Duplicate Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-15225 CachedPlanStage should execute for tr... Closed
Related
Operating System: ALL
Participants:

 Description   

I'm seeing something that looks like this with 2.6.9:

2015-04-20T16:26:05.147+0000 [conn61829] query *****.followings query: { $query: { user_id: ObjectId('*********') }, $orderby: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoskip:0 nscanned:125454 nscannedObjects:125454 keyUpdates:0 numYields:0 locks(micros) r:181778 nreturned:0 reslen:20 181ms
db.followings.getIndexes()
[
{
"v" : 1,
"name" : "id",
"key" :
{ "_id" : 1 }
,
"ns" : "*****.followings"
},
{
"v" : 1,
"key" :
{ "user_id" : 1 }
,
"name" : "user_id_1",
"ns" : "***********.followings"
}
]

The query at the console on the primary always uses the 'user_id_1' index, but intermittently, the query will start running very slowly on slaves in the replica set. The behavior started after upgrading from 2.4 last week, and happens on some other simple collections too. If I drop and regenerate the index the problem resets itself, but I assume that's similar to just restarting mongod. What is normally a 1ms query jumps to 100ms or more.



 Comments   
Comment by David Storch [ 13/May/15 ]

After further discussion, we have decided that the fix for this issue is SERVER-15225. Resolving as a duplicate.

Comment by David Storch [ 11/May/15 ]

Hi Paul,

Thanks for the prompt response. Yes, this does appear to confirm my hypothesis. Your proposed workaround in Mongoid seems like an appropriate short-term solution. (I'm not sure the details of why Mongoid would be adding a sort or whether there is an opportunity for a fix in Mongoid itself. The appropriate forum for this conversation would be the Mongoid Github issue tracker.)

As far as whether there is a code fix to make in the server codebase, I will have to discuss with my team. I am moving this ticket into the "Needs Triage" state. Note that SERVER-15225, resolved for development version 3.1.2, represents a general fix for plan selection issues like this one. If we cannot find a more targeted fix, we may decide to resolve as a duplicate of SERVER-15225. Please continue to watch for updates.

Best,
Dave

Comment by Paul Nock [ 11/May/15 ]

Thanks - here some more info...

PRIMARY> db.teams.find({user_id: ObjectId('4f3457deff95ae61d700afd1'), context: 'aut'}).sort({_id: 1}).limit(1).explain();
{
	"cursor" : "BtreeCursor _id_",
	"isMultiKey" : false,
	"n" : 1,
	"nscannedObjects" : 2,
	"nscanned" : 2,
	"nscannedObjectsAllPlans" : 4,
	"nscannedAllPlans" : 4,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"_id" : [
			[
				{
					"$minElement" : 1
				},
				{
					"$maxElement" : 1
				}
			]
		]
	},
	"server" : "******:27017",
	"filterSet" : false
}

I think this confirms your hypothesis? The sort clause is actually generated by mongoid - I'm using this with my rails app. So the application code is actually something like:
> Team.where(context: 'aut', user_id: '53d0a9eaf1499538a2000068')

Looks similar to this thread for mongoid: https://github.com/mongoid/mongoid/issues/3768

In this case, I think can probably switch the query to:
> Team.desc(:user_id).where(context: 'aut', user_id: '53d0a9eaf1499538a2000068')

In this case, the

{user_id, context}

index is selected:

PRIMARY> db.teams.find({user_id: ObjectId('4f3457deff95ae61d700afd1'), context: 'aut'}).sort({user_id: 1}).limit(1).explain();
	"cursor" : "BtreeCursor user_id_1_context_1",
	"isMultiKey" : false,
	"n" : 1,
	"nscannedObjects" : 1,
	"nscanned" : 1,
	"nscannedObjectsAllPlans" : 1,
	"nscannedAllPlans" : 1,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"user_id" : [
			[
				ObjectId("4f3457deff95ae61d700afd1"),
				ObjectId("4f3457deff95ae61d700afd1")
			]
		],
		"context" : [
			[
				"aut",
				"aut"
			]
		]
	},
	"server" : "******:27017",
	"filterSet" : false

I'd be interested if you have other solutions though - it would be nice to let mongoid just do its thing without having to remember to add the explicit sort.

Paul

Comment by David Storch [ 11/May/15 ]

Hi Paul,

Apologies for the delay in my response, and thanks for providing the additional information.

I've looked through the output from getPlansByQuery, and I think there is enough information now to see what is going on. It looks like this query shape was executed with a limit of 1. The query plan using index { _id: 1 } ended up winning in this case, because, when scanning by ascending _id one of the very first documents happens to match the query. The trial execution period for plan selection ends because the plan has generated the single desired result; the plan using { _id: 1 } is selected as the winner and cached. In general, however, { _id: 1 } is not the best plan, and you end up with slow queries when the plan cache entry gets used on subsequent executions of the same query shape.

In order to confirm this analysis, could you please run the following command and paste its output:

db.teams.find({user_id: ObjectId('4f3457deff95ae61d700afd1'), context: 'aut'}).sort({_id: 1}).limit(1).explain();

I am going to put this ticket back into "Waiting for User" state pending this additional information. Once we confirm the root cause of this behavior my team can discuss potential solutions on our end.

Best,
Dave

Comment by Paul Nock [ 27/Apr/15 ]

Ok, happening again today on a different collection - db.teams. It started about 3 hours ago, which is when our user traffic starts to pick up for the day, but other than that nothing different happening to any other day or time.

Here's some of the requested output from a secondary where the query is happening:

The logs don't seem much different to logLevel 0 ( I executed db.runCommand(

{getParameter: 1, logLevel: 1}

) on the secondary ):

2015-04-27T16:08:35.237+0000 [conn92899] query ****.posts query: { $query: { _id: ObjectId('553db3601fcaa39585000dea') }, $orderby: { _id: 1 } } planSummary: IDHACK ntoskip:0 nscanned:1 nscannedObjects:1 idhack:1 keyUpdates:0 numYields:0 locks(micros) r:34 nreturned:1 reslen:407 0ms
2015-04-27T16:08:35.239+0000 [conn108638] command ****.$cmd command: count { count: "patients", query: { user_id: ObjectId('54fddc8b6bcd61b52a000716') } } planSummary: COUNT { user_id: 1 } keyUpdates:0 numYields:0 locks(micros) r:53 reslen:48 0ms
2015-04-27T16:08:35.240+0000 [conn108638] query ****.patients query: { $query: { user_id: ObjectId('54fddc8b6bcd61b52a000716') }, $orderby: { _id: 1 } } planSummary: IXSCAN { user_id: 1 }, IXSCAN { user_id: 1 } ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:124 nreturned:1 reslen:526 0ms
2015-04-27T16:08:35.430+0000 [conn108638] query ****.teams query: { $query: { user_id: ObjectId('54fddc8b6bcd61b52a000716'), context: "epl" }, $orderby: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoskip:0 nscanned:148667 nscannedObjects:148667 keyUpdates:0 numYields:0 locks(micros) r:183957 nreturned:1 reslen:564 183ms
2015-04-27T16:08:35.436+0000 [conn108638] command ****.$cmd command: count { count: "patients", query: { user_id: ObjectId('551d6bb56bcd617b52000b74') } } planSummary: COUNT { user_id: 1 } keyUpdates:0 numYields:0 locks(micros) r:90 reslen:48 0ms
2015-04-27T16:08:35.436+0000 [conn108638] query ****.patients query: { $query: { user_id: ObjectId('551d6bb56bcd617b52000b74') }, $orderby: { _id: 1 } } planSummary: IXSCAN { user_id: 1 }, IXSCAN { user_id: 1 } ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:155 nreturned:1 reslen:547 0ms

rs:SECONDARY> db.teams.find({user_id: ObjectId('54fddc8b6bcd61b52a000716'), context: 'epl'}).sort({_id: 1}).explain(true)
{
	"cursor" : "BtreeCursor user_id_1_context_1",
	"isMultiKey" : false,
	"n" : 1,
	"nscannedObjects" : 1,
	"nscanned" : 1,
	"nscannedObjectsAllPlans" : 10,
	"nscannedAllPlans" : 10,
	"scanAndOrder" : true,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"user_id" : [
			[
				ObjectId("54fddc8b6bcd61b52a000716"),
				ObjectId("54fddc8b6bcd61b52a000716")
			]
		],
		"context" : [
			[
				"epl",
				"epl"
			]
		]
	},
	"allPlans" : [
		{
			"cursor" : "BtreeCursor user_id_1_context_1",
			"isMultiKey" : false,
			"n" : 1,
			"nscannedObjects" : 1,
			"nscanned" : 1,
			"scanAndOrder" : true,
			"indexOnly" : false,
			"nChunkSkips" : 0,
			"indexBounds" : {
				"user_id" : [
					[
						ObjectId("54fddc8b6bcd61b52a000716"),
						ObjectId("54fddc8b6bcd61b52a000716")
					]
				],
				"context" : [
					[
						"epl",
						"epl"
					]
				]
			}
		},
		{
			"cursor" : "BtreeCursor _id_",
			"isMultiKey" : false,
			"n" : 0,
			"nscannedObjects" : 5,
			"nscanned" : 5,
			"scanAndOrder" : false,
			"indexOnly" : false,
			"nChunkSkips" : 0,
			"indexBounds" : {
				"_id" : [
					[
						{
							"$minElement" : 1
						},
						{
							"$maxElement" : 1
						}
					]
				]
			}
		},
		{
			"cursor" : "BtreeCursor context_1_provider_ids_1",
			"isMultiKey" : true,
			"n" : 0,
			"nscannedObjects" : 4,
			"nscanned" : 4,
			"scanAndOrder" : true,
			"indexOnly" : false,
			"nChunkSkips" : 0,
			"indexBounds" : {
				"context" : [
					[
						"epl",
						"epl"
					]
				],
				"provider_ids" : [
					[
						{
							"$minElement" : 1
						},
						{
							"$maxElement" : 1
						}
					]
				]
			}
		}
	],
	"server" : "****:27017",
	"filterSet" : false,
	"stats" : {
		"type" : "SORT",
		"works" : 6,
		"yields" : 0,
		"unyields" : 0,
		"invalidates" : 0,
		"advanced" : 1,
		"needTime" : 2,
		"needFetch" : 0,
		"isEOF" : 1,
		"forcedFetches" : 0,
		"memUsage" : 552,
		"memLimit" : 33554432,
		"children" : [
			{
				"type" : "KEEP_MUTATIONS",
				"works" : 2,
				"yields" : 0,
				"unyields" : 0,
				"invalidates" : 0,
				"advanced" : 1,
					"needTime" : 0,
				"needFetch" : 0,
				"isEOF" : 1,
				"children" : [
					{
						"type" : "FETCH",
						"works" : 2,
						"yields" : 0,
						"unyields" : 0,
						"invalidates" : 0,
						"advanced" : 1,
						"needTime" : 0,
						"needFetch" : 0,
						"isEOF" : 1,
						"alreadyHasObj" : 0,
						"forcedFetches" : 0,
						"matchTested" : 0,
						"children" : [
							{
								"type" : "IXSCAN",
								"works" : 2,
								"yields" : 0,
								"unyields" : 0,
								"invalidates" : 0,
								"advanced" : 1,
								"needTime" : 0,
								"needFetch" : 0,
								"isEOF" : 1,
								"keyPattern" : "{ user_id: 1.0, context: 1.0 }",
								"isMultiKey" : 0,
								"boundsVerbose" : "field #0['user_id']: [ObjectId('54fddc8b6bcd61b52a000716'), ObjectId('54fddc8b6bcd61b52a000716')], field #1['context']: [\"epl\", \"epl\"]",
								"yieldMovedCursor" : 0,
								"dupsTested" : 0,
								"dupsDropped" : 0,
								"seenInvalidated" : 0,
								"matchTested" : 0,
								"keysExamined" : 1,
								"children" : [ ]
							}
						]
					}
				]
			}
		]
	}
}

rs:SECONDARY> db.teams.getPlanCache().getPlansByQuery({user_id: ObjectId('54fddc8b6bcd61b52a000716'), context: 'epl'}, {}, {_id: 1})
[
	{
		"details" : {
			"solution" : "(whole index scan solution: dir=1; tree=Leaf { _id: 1 }, pos: 0\n)"
		},
		"reason" : {
			"score" : 1.5003000000000002,
			"stats" : {
				"type" : "LIMIT",
				"works" : 2,
				"yields" : 0,
				"unyields" : 0,
				"invalidates" : 0,
				"advanced" : 1,
				"needTime" : 1,
				"needFetch" : 0,
				"isEOF" : 1,
				"children" : [
					{
						"type" : "FETCH",
						"works" : 2,
						"yields" : 0,
						"unyields" : 0,
						"invalidates" : 0,
						"advanced" : 1,
						"needTime" : 1,
						"needFetch" : 0,
						"isEOF" : 0,
						"alreadyHasObj" : 0,
						"forcedFetches" : 0,
						"matchTested" : 1,
						"children" : [
							{
								"type" : "IXSCAN",
								"works" : 2,
								"yields" : 0,
								"unyields" : 0,
								"invalidates" : 0,
								"advanced" : 2,
								"needTime" : 0,
								"needFetch" : 0,
								"isEOF" : 0,
								"keyPattern" : "{ _id: 1 }",
								"isMultiKey" : 0,
								"boundsVerbose" : "field #0['_id']: [MinKey, MaxKey]",
								"yieldMovedCursor" : 0,
								"dupsTested" : 0,
								"dupsDropped" : 0,
								"seenInvalidated" : 0,
								"matchTested" : 0,
								"keysExamined" : 2,
								"children" : [ ]
							}
						]
					}
				]
			}
		},
		"feedback" : {
			"nfeedback" : 20,
			"averageScore" : 1.0003116784495691,
			"stdDevScore" : 0.000009274428678544542,
			"scores" : [
				{
					"score" : 1.0003061120958376
				},
				{
					"score" : 1.0003057959926507
				},
				{
					"score" : 1.0003099475762731
				},
				{
					"score" : 1.000311135857461
				},
				{
					"score" : 1.0003058139872907
				},
				{
					"score" : 1.0003078347187728
				},
				{
					"score" : 1.0003078347187728
				},
				{
					"score" : 1.0003061585076705
				},
				{
					"score" : 1.0003100615768503
				},
				{
					"score" : 1.000330536215952
				},
				{
					"score" : 1.0003063062040436
				},
				{
					"score" : 1.0003094269365285
				},
				{
					"score" : 1.0003157649136083
				},
				{
					"score" : 1.0003063062835809
				},
				{
					"score" : 1.000309425070688
				},
				{
					"score" : 1.0003104884469756
				},
				{
					"score" : 1.0003063062040436
				},
				{
					"score" : 1.000342890842805
				},
				{
					"score" : 1.0003094269365285
				},
				{
					"score" : 1.0003159959050483
				}
			]
		},
		"filterSet" : false
	},
	{
		"details" : {
			"solution" : "(index-tagged expression tree: tree=Node\n---Leaf { context: 1.0, provider_ids: 1.0 }, pos: 0\n---Leaf \n)"
		},
		"reason" : {
			"score" : 1.0002,
			"stats" : {
				"type" : "KEEP_MUTATIONS",
				"works" : 2,
				"yields" : 0,
				"unyields" : 0,
				"invalidates" : 0,
				"advanced" : 0,
				"needTime" : 2,
				"needFetch" : 0,
				"isEOF" : 0,
				"children" : [
					{
						"type" : "OR",
						"works" : 2,
						"yields" : 0,
						"unyields" : 0,
						"invalidates" : 0,
						"advanced" : 0,
						"needTime" : 2,
						"needFetch" : 0,
						"isEOF" : 0,
						"dupsTested" : 0,
						"dupsDropped" : 0,
						"locsForgotten" : 0,
						"matchTested_0" : 0,
						"matchTested_1" : 0,
						"children" : [
							{
								"type" : "SORT",
								"works" : 2,
								"yields" : 0,
								"unyields" : 0,
								"invalidates" : 0,
								"advanced" : 0,
								"needTime" : 1,
								"needFetch" : 0,
								"isEOF" : 0,
								"forcedFetches" : 0,
								"memUsage" : 0,
								"memLimit" : 33554432,
								"children" : [
									{
										"type" : "FETCH",
										"works" : 1,
										"yields" : 0,
										"unyields" : 0,
										"invalidates" : 0,
										"advanced" : 0,
										"needTime" : 1,
										"needFetch" : 0,
										"isEOF" : 0,
										"alreadyHasObj" : 0,
										"forcedFetches" : 0,
										"matchTested" : 0,
										"children" : [
											{
												"type" : "IXSCAN",
												"works" : 1,
												"yields" : 0,
												"unyields" : 0,
												"invalidates" : 0,
												"advanced" : 1,
												"needTime" : 0,
												"needFetch" : 0,
												"isEOF" : 0,
												"keyPattern" : "{ context: 1.0, provider_ids: 1.0 }",
												"isMultiKey" : 1,
												"boundsVerbose" : "field #0['context']: [\"aut\", \"aut\"], field #1['provider_ids']: [MinKey, MaxKey]",
												"yieldMovedCursor" : 0,
												"dupsTested" : 1,
												"dupsDropped" : 0,
												"seenInvalidated" : 0,
												"matchTested" : 0,
												"keysExamined" : 1,
												"children" : [ ]
											}
										]
									}
								]
							},
							{
								"type" : "SORT",
								"works" : 0,
								"yields" : 0,
								"unyields" : 0,
								"invalidates" : 0,
								"advanced" : 0,
								"needTime" : 0,
								"needFetch" : 0,
								"isEOF" : 0,
								"forcedFetches" : 0,
								"memUsage" : 0,
								"memLimit" : 33554432,
								"children" : [
									{
										"type" : "FETCH",
										"works" : 0,
										"yields" : 0,
										"unyields" : 0,
										"invalidates" : 0,
										"advanced" : 0,
										"needTime" : 0,
										"needFetch" : 0,
										"isEOF" : 0,
										"alreadyHasObj" : 0,
										"forcedFetches" : 0,
										"matchTested" : 0,
										"children" : [
											{
												"type" : "IXSCAN",
												"works" : 0,
												"yields" : 0,
												"unyields" : 0,
												"invalidates" : 0,
												"advanced" : 0,
												"needTime" : 0,
												"needFetch" : 0,
												"isEOF" : 0,
												"keyPattern" : "{ context: 1.0, provider_ids: 1.0 }",
												"isMultiKey" : 1,
												"boundsVerbose" : "field #0['context']: [\"aut\", \"aut\"], field #1['provider_ids']: [MinKey, MaxKey]",
												"yieldMovedCursor" : 0,
												"dupsTested" : 0,
												"dupsDropped" : 0,
												"seenInvalidated" : 0,
												"matchTested" : 0,
												"keysExamined" : 0,
												"children" : [ ]
											}
										]
									}
								]
							}
						]
					}
				]
			}
		},
		"feedback" : {
			
		},
		"filterSet" : false
	},
	{
		"details" : {
			"solution" : "(index-tagged expression tree: tree=Node\n---Leaf { user_id: 1.0, context: 1.0 }, pos: 1\n---Leaf { user_id: 1.0, context: 1.0 }, pos: 0\n)"
		},
		"reason" : {
			"score" : 1.0002,
			"stats" : {
				"type" : "KEEP_MUTATIONS",
				"works" : 2,
				"yields" : 0,
				"unyields" : 0,
				"invalidates" : 0,
				"advanced" : 0,
				"needTime" : 2,
				"needFetch" : 0,
				"isEOF" : 0,
				"children" : [
					{
						"type" : "OR",
						"works" : 2,
						"yields" : 0,
						"unyields" : 0,
						"invalidates" : 0,
						"advanced" : 0,
						"needTime" : 2,
						"needFetch" : 0,
						"isEOF" : 0,
						"dupsTested" : 0,
						"dupsDropped" : 0,
						"locsForgotten" : 0,
						"matchTested_0" : 0,
						"matchTested_1" : 0,
						"children" : [
							{
								"type" : "SORT",
								"works" : 2,
								"yields" : 0,
								"unyields" : 0,
								"invalidates" : 0,
								"advanced" : 0,
								"needTime" : 1,
								"needFetch" : 0,
								"isEOF" : 0,
								"forcedFetches" : 0,
								"memUsage" : 4007,
								"memLimit" : 33554432,
								"children" : [
									{
										"type" : "FETCH",
										"works" : 1,
										"yields" : 0,
										"unyields" : 0,
										"invalidates" : 0,
										"advanced" : 1,
										"needTime" : 0,
										"needFetch" : 0,
										"isEOF" : 0,
										"alreadyHasObj" : 0,
										"forcedFetches" : 0,
										"matchTested" : 0,
										"children" : [
											{
												"type" : "IXSCAN",
												"works" : 1,
												"yields" : 0,
												"unyields" : 0,
												"invalidates" : 0,
												"advanced" : 1,
												"needTime" : 0,
												"needFetch" : 0,
												"isEOF" : 0,
												"keyPattern" : "{ user_id: 1.0, context: 1.0 }",
												"isMultiKey" : 0,
												"boundsVerbose" : "field #0['user_id']: [ObjectId('4f3457deff95ae61d700afd1'), ObjectId('4f3457deff95ae61d700afd1')], field #1['context']: [\"aut\", \"aut\"]",
												"yieldMovedCursor" : 0,
												"dupsTested" : 0,
												"dupsDropped" : 0,
												"seenInvalidated" : 0,
												"matchTested" : 0,
												"keysExamined" : 1,
												"children" : [ ]
											}
										]
									}
								]
							},
							{
								"type" : "SORT",
								"works" : 0,
								"yields" : 0,
								"unyields" : 0,
								"invalidates" : 0,
								"advanced" : 0,
								"needTime" : 0,
								"needFetch" : 0,
								"isEOF" : 0,
								"forcedFetches" : 0,
								"memUsage" : 0,
								"memLimit" : 33554432,
								"children" : [
									{
										"type" : "FETCH",
										"works" : 0,
										"yields" : 0,
										"unyields" : 0,
										"invalidates" : 0,
										"advanced" : 0,
										"needTime" : 0,
										"needFetch" : 0,
										"isEOF" : 0,
										"alreadyHasObj" : 0,
										"forcedFetches" : 0,
										"matchTested" : 0,
										"children" : [
											{
												"type" : "IXSCAN",
												"works" : 0,
												"yields" : 0,
												"unyields" : 0,
												"invalidates" : 0,
												"advanced" : 0,
												"needTime" : 0,
												"needFetch" : 0,
												"isEOF" : 0,
												"keyPattern" : "{ user_id: 1.0, context: 1.0 }",
												"isMultiKey" : 0,
												"boundsVerbose" : "field #0['user_id']: [ObjectId('4f3457deff95ae61d700afd1'), ObjectId('4f3457deff95ae61d700afd1')], field #1['context']: [\"aut\", \"aut\"]",
												"yieldMovedCursor" : 0,
												"dupsTested" : 0,
												"dupsDropped" : 0,
												"seenInvalidated" : 0,
												"matchTested" : 0,
												"keysExamined" : 0,
												"children" : [ ]
											}
										]
									}
								]
							}
						]
					}
				]
			}
		},
		"feedback" : {
			
		},
		"filterSet" : false
	}
]

Thanks for any advice!
Paul

Comment by David Storch [ 22/Apr/15 ]

Thanks Paul. Unfortunately, these logs don't have enough information to give any more hints about what's going on. I was hoping that we might capture logging of the query that is causing a bad plan to enter the plan cache. I'm going to put this ticket back into "Waiting for User" state pending more detailed logs / query plan cache output.

Comment by Paul Nock [ 22/Apr/15 ]

It was happening a few hours ago, for about 2.5 hours, but stopped before I could get the plan cache. For now, here's some logs from this, though it isn't very verbose. The problematic query here is at 2015-04-22T13:09:36.590+0000, the query on the teams collection...

2015-04-22T13:09:34.763+0000 [initandlisten] connection accepted from *.*.*.66:53059 #28167 (21 connections now open)
2015-04-22T13:09:34.966+0000 [conn28166] end connection *.*.*.66:53049 (20 connections now open)
2015-04-22T13:09:35.026+0000 [conn28157] query *.posts query: { $query: { created_by_id: ObjectId('53fd140eff95aee1c300002f') }, $orderby: { updated_at: -1 } } planSummary: IXSCAN { created_by_id: 1.0, created_at: -1.0 }, IXSCAN { created_by_id: 1.0, created_at: -1.0 } ntoreturn:1 ntoskip:0 nscanned:21617 nscannedObjects:21617 keyUpdates:0 numYields:0 locks(micros) r:119144 nreturned:1 reslen:516 119ms
2015-04-22T13:09:35.094+0000 [conn28147] query *.posts query: { $query: { created_by_id: ObjectId('53c43233ff95ae9779000007') }, $orderby: { updated_at: -1 } } planSummary: IXSCAN { created_by_id: 1.0, created_at: -1.0 }, IXSCAN { created_by_id: 1.0, created_at: -1.0 } ntoreturn:1 ntoskip:0 nscanned:56698 nscannedObjects:56698 keyUpdates:0 numYields:5 locks(micros) r:485668 nreturned:1 reslen:508 248ms
2015-04-22T13:09:35.096+0000 [initandlisten] connection accepted from *.*.*.66:53068 #28168 (21 connections now open)
2015-04-22T13:09:35.126+0000 [conn28167] end connection *.*.*.66:53059 (20 connections now open)
2015-04-22T13:09:35.145+0000 [conn28164] end connection *.*.*.66:53035 (19 connections now open)
2015-04-22T13:09:35.241+0000 [initandlisten] connection accepted from *.*.*.66:53073 #28169 (20 connections now open)
2015-04-22T13:09:35.278+0000 [initandlisten] connection accepted from *.*.*.66:53077 #28170 (21 connections now open)
2015-04-22T13:09:35.335+0000 [conn28147] query *.posts query: { $query: { created_by_id: ObjectId('53c4329aff95ae977900000f') }, $orderby: { updated_at: -1 } } planSummary: IXSCAN { created_by_id: 1.0, created_at: -1.0 }, IXSCAN { created_by_id: 1.0, created_at: -1.0 } ntoreturn:1 ntoskip:0 nscanned:24165 nscannedObjects:24165 keyUpdates:0 numYields:2 locks(micros) r:138107 nreturned:1 reslen:514 113ms
2015-04-22T13:09:35.512+0000 [conn28169] end connection *.*.*.66:53073 (20 connections now open)
2015-04-22T13:09:35.626+0000 [initandlisten] connection accepted from *.*.*.66:53086 #28171 (21 connections now open)
2015-04-22T13:09:35.807+0000 [conn28168] end connection *.*.*.66:53068 (20 connections now open)
2015-04-22T13:09:35.929+0000 [initandlisten] connection accepted from *.*.*.66:53096 #28172 (21 connections now open)
2015-04-22T13:09:36.263+0000 [conn28171] end connection *.*.*.66:53086 (20 connections now open)
2015-04-22T13:09:36.357+0000 [initandlisten] connection accepted from *.*.*.66:53108 #28173 (21 connections now open)
2015-04-22T13:09:36.370+0000 [conn28172] end connection *.*.*.66:53096 (20 connections now open)
2015-04-22T13:09:36.469+0000 [initandlisten] connection accepted from *.*.*.66:53112 #28174 (21 connections now open)
2015-04-22T13:09:36.590+0000 [conn28174] query *.teams query: { $query: { user_id: ObjectId('502a46f76bcd612fde0012c1'), context: "aut" }, $orderby: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoskip:0 nscanned:27552 nscannedObjects:27552 keyUpdates:0 numYields:1 locks(micros) r:182798 nreturned:1 reslen:189 102ms

So far, it does seem to keep coming back so I think I'll probably have chance to get more detailed info while it's happening.

thanks, Paul

Comment by David Storch [ 22/Apr/15 ]

Hi pnock,

In the absence of a consistent way to reproduce this problem, the following are things that may help lead to a diagnosis:

  • Pasting the output of running the problematic query with .explain(true). I expect this to show normal plan selection and query execution, but it will be good to have this information recorded.
  • A snippet of the logs leading up and including the slow query. If it is possible to provide the logs at logLevel 1, that would be even better, although I understand this may not be possible if you are running a production system at logLevel 0.
  • Capturing the plan cache output while the bad query plan is in use. In the shell, you can get plan cache output for the problematic query shape using db.followings.getPlanCache().getPlansByQuery(<query>, <projection>, <sort>). Assuming that you have no projection, this would be db.followings.getPlanCache().getPlansByQuery({ user_id: ObjectId('*********') }, {}, { _id: 1 }).

Of course, a repro script would be far and away the best. Let me know if you have any luck figuring out how to reproduce.

Best,
Dave

Comment by David Storch [ 21/Apr/15 ]

Per discussion on SERVER-14071, this is a possible dup of SERVER-17815, a fix for which will be available in the upcoming 2.6.10 release. Keeping this ticket open in order to see if we can confirm or disconfirm this as a duplicate before 2.6.10 comes out.

Generated at Thu Feb 08 03:46:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.