[SERVER-16895] Users should be able to request that explain not bypass the plan cache Created: 16/Jan/15  Updated: 30/Jan/24

Status: Open
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: J Rassi Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 4
Labels: qexec-team, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-37423 query does not use same execution pla... Closed
Related
related to SERVER-77859 Add parameter is explain with 'execut... Closed
is related to SERVER-40361 Reduce memory footprint of plan cache... Closed
is related to SERVER-82238 Add field to explain to indicate if w... Closed
Assigned Teams:
Query Execution
Sprint: Query 2020-11-30, Query 2020-12-14, Query 2020-12-28, Query 2021-01-11, Query 2021-01-25, Query Execution 2021-02-22, Query Execution 2021-03-08, Query Execution 2021-03-22, QE 2023-08-21, QE 2023-09-04, QE 2023-09-18, QE 2023-10-02, QE 2023-10-16, QE 2023-10-30
Participants:
Case:

 Description   

Users should be able to request that explain not bypass the plan cache. This would enable users to more easily retrieve stats information about queries run against with the shape's current cached plan, to aid in introspection when diagnosing query performance issues.

Users can get similar functionality today by using getPlansByQuery() to look up the currently cached plan and then requesting an explain of a hinted query against that plan. However, not all plans can be hinted (for example, multi-index plans cannot be hinted). Also, making this a 1-step process would improve usability.



 Comments   
Comment by Xiaochen Wu [ 08/Jun/23 ]

If I'm correct, we need to decide how users will be issuing the request (not bypass plan cache) and also the output format when the explain results are from plan cache. I'd recommend us to move this to quick win candidates first and get into the voting process. Once we decide to pick it up in the next quarter, the assignee and the product can work together on a mini design doc for the syntax changes. Make sense? FYI kateryna.kamenieva@mongodb.com 

Comment by Chris Harris [ 12/Feb/21 ]

Hi david.storch,

Thanks for the update and extra detail regarding the impact that SERVER-40361 could have related to this request. Placing this ticket into the backlog seems reasonable at this time. I only have one comment related to your note that is worth mentioning.

Somewhat by coincidence, we do plan on attempting to leverage the plan cache information more frequently going forward than we have historically. The introduction of the $planCacheStats aggregation operator combined with the queryHash field make retrieving the data much easier than it was previously.  As correctly forming the explain command is also sometimes difficult for larger expressions, it seems that gathering the information via an aggregation pipeline will provide a superior experience.  The data from the plan cache also helps answer the "what happened" question (as opposed to "what is/can happen" that explain provides more broadly).

Given the above, I can't immediately think of a situation where we would need explain output directly using the cached plan where the associated plan cache information itself would not suffice.  Perhaps the one edge case would be the combination where debug information has been stripped and we are not able to hint the cached plan.  It is probably "to be determined" how commonly this happens in practice.  Perhaps that's what we can track in this ticket to help make a prioritization decision in the future.

Comment by David Storch [ 10/Feb/21 ]

christopher.harris and alex.bevilacqua, after discussing this more with kateryna.kamenieva we have decided to return this ticket to the backlog. Please let us know if you find a need for this improvement on versions that have the fix for related ticket SERVER-40361.

To elaborate, SERVER-40361 introduces logic to strip debug information out of the plan cache in order to prevent the plan cache from consuming an excessive amount of memory. When the debug info has been stripped, the output from $planCacheStats (as well as planCacheListQueryShapes and planCacheListPlans in older versions) will be less useful. I have an example here that shows what the output of planCacheListQueryShapes and planCacheListPlans will look like in the 4.2 branch when debug info is stripped. Below I've provided an additional example, this time showing $planCacheStats output on the master branch both with and without debug info:

// Plan cache entry from $planCacheStats without debug info stripped.
{
	"queryHash" : "43CAB4C5",
	"planCacheKey" : "CEC1F6AF",
	"isActive" : false,
	"works" : NumberLong(1),
	"timeOfCreation" : ISODate("2021-02-10T19:51:32.960Z"),
	"createdFromQuery" : {
		"query" : {
			"a" : 1,
			"b" : 1
		},
		"sort" : {
 
		},
		"projection" : {
 
		}
	},
	"cachedPlan" : {
		"stage" : "FETCH",
		"filter" : {
			"b" : {
				"$eq" : 1
			}
		},
		"inputStage" : {
			"stage" : "IXSCAN",
			"keyPattern" : {
				"a" : 1
			},
			"indexName" : "a_1",
			"isMultiKey" : false,
			"multiKeyPaths" : {
				"a" : [ ]
			},
			"isUnique" : false,
			"isSparse" : false,
			"isPartial" : false,
			"indexVersion" : 2,
			"direction" : "forward",
			"indexBounds" : {
				"a" : [
					"[1.0, 1.0]"
				]
			}
		}
	},
	"creationExecStats" : [
		{
			"nReturned" : 0,
			"executionTimeMillisEstimate" : 0,
			"totalKeysExamined" : 0,
			"totalDocsExamined" : 0,
			"executionStages" : {
				"stage" : "FETCH",
				"filter" : {
					"b" : {
						"$eq" : 1
					}
				},
				"nReturned" : 0,
				"executionTimeMillisEstimate" : 0,
				"works" : 1,
				"advanced" : 0,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"docsExamined" : 0,
				"alreadyHasObj" : 0,
				"inputStage" : {
					"stage" : "IXSCAN",
					"nReturned" : 0,
					"executionTimeMillisEstimate" : 0,
					"works" : 1,
					"advanced" : 0,
					"needTime" : 0,
					"needYield" : 0,
					"saveState" : 0,
					"restoreState" : 0,
					"isEOF" : 1,
					"keyPattern" : {
						"a" : 1
					},
					"indexName" : "a_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"a" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"a" : [
							"[1.0, 1.0]"
						]
					},
					"keysExamined" : 0,
					"seeks" : 1,
					"dupsTested" : 0,
					"dupsDropped" : 0
				}
			}
		},
		{
			"nReturned" : 0,
			"executionTimeMillisEstimate" : 0,
			"totalKeysExamined" : 0,
			"totalDocsExamined" : 0,
			"executionStages" : {
				"stage" : "FETCH",
				"filter" : {
					"a" : {
						"$eq" : 1
					}
				},
				"nReturned" : 0,
				"executionTimeMillisEstimate" : 0,
				"works" : 1,
				"advanced" : 0,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"docsExamined" : 0,
				"alreadyHasObj" : 0,
				"inputStage" : {
					"stage" : "IXSCAN",
					"nReturned" : 0,
					"executionTimeMillisEstimate" : 0,
					"works" : 1,
					"advanced" : 0,
					"needTime" : 0,
					"needYield" : 0,
					"saveState" : 0,
					"restoreState" : 0,
					"isEOF" : 1,
					"keyPattern" : {
						"b" : 1
					},
					"indexName" : "b_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"b" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"b" : [
							"[1.0, 1.0]"
						]
					},
					"keysExamined" : 0,
					"seeks" : 1,
					"dupsTested" : 0,
					"dupsDropped" : 0
				}
			}
		},
		{
			"nReturned" : 0,
			"executionTimeMillisEstimate" : 0,
			"totalKeysExamined" : 0,
			"totalDocsExamined" : 0,
			"executionStages" : {
				"stage" : "FETCH",
				"filter" : {
					"$and" : [
						{
							"a" : {
								"$eq" : 1
							}
						},
						{
							"b" : {
								"$eq" : 1
							}
						}
					]
				},
				"nReturned" : 0,
				"executionTimeMillisEstimate" : 0,
				"works" : 1,
				"advanced" : 0,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"docsExamined" : 0,
				"alreadyHasObj" : 0,
				"inputStage" : {
					"stage" : "AND_SORTED",
					"nReturned" : 0,
					"executionTimeMillisEstimate" : 0,
					"works" : 1,
					"advanced" : 0,
					"needTime" : 0,
					"needYield" : 0,
					"saveState" : 0,
					"restoreState" : 0,
					"isEOF" : 1,
					"failedAnd_0" : 0,
					"failedAnd_1" : 0,
					"inputStages" : [
						{
							"stage" : "IXSCAN",
							"nReturned" : 0,
							"executionTimeMillisEstimate" : 0,
							"works" : 1,
							"advanced" : 0,
							"needTime" : 0,
							"needYield" : 0,
							"saveState" : 0,
							"restoreState" : 0,
							"isEOF" : 1,
							"keyPattern" : {
								"a" : 1
							},
							"indexName" : "a_1",
							"isMultiKey" : false,
							"multiKeyPaths" : {
								"a" : [ ]
							},
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "forward",
							"indexBounds" : {
								"a" : [
									"[1.0, 1.0]"
								]
							},
							"keysExamined" : 0,
							"seeks" : 1,
							"dupsTested" : 0,
							"dupsDropped" : 0
						},
						{
							"stage" : "IXSCAN",
							"nReturned" : 0,
							"executionTimeMillisEstimate" : 0,
							"works" : 0,
							"advanced" : 0,
							"needTime" : 0,
							"needYield" : 0,
							"saveState" : 0,
							"restoreState" : 0,
							"isEOF" : 0,
							"keyPattern" : {
								"b" : 1
							},
							"indexName" : "b_1",
							"isMultiKey" : false,
							"multiKeyPaths" : {
								"b" : [ ]
							},
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "forward",
							"indexBounds" : {
								"b" : [
									"[1.0, 1.0]"
								]
							},
							"keysExamined" : 0,
							"seeks" : 0,
							"dupsTested" : 0,
							"dupsDropped" : 0
						}
					]
				}
			}
		}
	],
	"candidatePlanScores" : [
		2.0002,
		2.0002,
		2.0000999999999998
	],
	"indexFilterSet" : false,
	"estimatedSizeBytes" : NumberLong(4797),
	"host" : "storchbox:27017"
}
 
// Plan cache entry from $planCacheStats with debug info stripped.
{
	"queryHash" : "43CAB4C5",
	"planCacheKey" : "CEC1F6AF",
	"isActive" : false,
	"works" : NumberLong(1),
	"timeOfCreation" : ISODate("2021-02-10T19:53:15.496Z"),
	"indexFilterSet" : false,
	"estimatedSizeBytes" : NumberLong(1244),
	"host" : "storchbox:27017"
}

As you can see, the amount of information available from the plan cache in the case of stripped debug info is dramatically reduced. I could imagine that this could be a problem for support scenarios, e.g. if a customer is experiencing a performance problem due to a bad plan cache entry. The suggestion in this ticket relates in that it would allow the user to discover details about the cached plan using explain. By explaining a poorly performing query, and instructing explain to make use of the plan cache, the user could see details about the cached plan even when the debug info has been stripped. I'd be to discuss this more if you have any thoughts, but for now this ticket is headed back to the backlog.

Comment by David Storch [ 13/Oct/20 ]

Flagging for scheduling. This came up in the context of SERVER-40361, which will strip debug info out of the plan cache in some contexts. Having this feature implemented could aid diagnosis in the event that plan cache debug info is absent.

Generated at Thu Feb 08 03:42:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.