[SERVER-61497] Improve the readability of explain plans in 5.X and above Created: 15/Nov/21  Updated: 24/Oct/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: 5.1.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Harshad Dhavale Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 2
Labels: explain-plan, query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-82427 Improve the mongo shell so that objec... Open
Assigned Teams:
Query Execution
Participants:

 Description   

MongoDB version 5.X introduced the slot-based execution (SBE) query engine. Running an explain() plan on a query that is using SBE provides an output like this:

PRIMARY> db.test1.explain().find({a:1})
{
	"explainVersion" : "2",
	"queryPlanner" : {
		"namespace" : "test.test1",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"a" : {
				"$eq" : 1
			}
		},
		"queryHash" : "4B53BE76",
		"planCacheKey" : "2DEF680C",
		"maxIndexedOrSolutionsReached" : false,
		"maxIndexedAndSolutionsReached" : false,
		"maxScansToExplodeReached" : false,
		"winningPlan" : {
			"queryPlan" : {
				"stage" : "FETCH",
				"planNodeId" : 2,
				"inputStage" : {
					"stage" : "IXSCAN",
					"planNodeId" : 1,
					"keyPattern" : {
						"a" : 1
					},
					"indexName" : "a_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"a" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"a" : [
							"[1.0, 1.0]"
						]
					}
				}
			},
			"slotBasedPlan" : {
				"slots" : "$$RESULT=s13 $$RID=s14 env: { s2 = Nothing (SEARCH_META), s1 = TimeZoneDatabase(Canada/Yukon...America/Rainy_River) (timeZoneDB), s4 = 1637002207166 (NOW), s3 = Timestamp(1637002203, 1) (CLUSTER_TIME) }",
				"stages" : "[2] nlj [] [s9, s5, s6, s7, s8] \n    left \n        [1] nlj [s6, s8] [s10, s11] \n            left \n                [1] project [s6 = \"a_1\", s8 = {\"a\" : 1}, s10 = KS(2B020104), s11 = KS(2B02FE04)] \n                [1] limit 1 \n                [1] coscan \n            right \n                [1] project [s5 = s12] \n                [1] ixseek s10 s11 s7 s9 s12 [] @\"2c41ddb0-4c30-488c-a2dd-11240fa088e8\" @\"a_1\" true \n            \n        \n    right \n        [2] limit 1 \n        [2] seek s9 s13 s14 s5 s6 s7 s8 [] @\"2c41ddb0-4c30-488c-a2dd-11240fa088e8\" true false \n    \n"
			}
		},
		"rejectedPlans" : [ ]
	},
	"command" : {
		"find" : "test1",
		"filter" : {
			"a" : 1
		},
		"$db" : "test"
	},
	"serverInfo" : {
		"host" : "atlas-9ev3lx-shard-00-02.oz4jm.mongodb.net",
		"port" : 27017,
		"version" : "5.1.0",
		"gitVersion" : "2454bbc8c20c8d6d6d9590a66e939d568b9e7222"
	},
	"serverParameters" : {
		"internalQueryFacetBufferSizeBytes" : 104857600,
		"internalQueryFacetMaxOutputDocSizeBytes" : 104857600,
		"internalLookupStageIntermediateDocumentMaxSizeBytes" : 104857600,
		"internalDocumentSourceGroupMaxMemoryBytes" : 104857600,
		"internalQueryMaxBlockingSortMemoryUsageBytes" : 104857600,
		"internalQueryProhibitBlockingMergeOnMongoS" : 0,
		"internalQueryMaxAddToSetBytes" : 104857600,
		"internalDocumentSourceSetWindowFieldsMaxMemoryBytes" : 104857600
	},
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1637002203, 1),
		"signature" : {
			"hash" : BinData(0,"qMUDrkJ1eQV3Y+0cDewpDaiM+s0="),
			"keyId" : NumberLong("7029021327763701764")
		}
	},
	"operationTime" : Timestamp(1637002203, 1)
}

While the "winningPlan" section as well as other sections above are useful, the "slotBasedPlan" section cannot be comprehended easily:

"slotBasedPlan" : {
				"slots" : "$$RESULT=s13 $$RID=s14 env: { s2 = Nothing (SEARCH_META), s1 = TimeZoneDatabase(Canada/Yukon...America/Rainy_River) (timeZoneDB), s4 = 1637002207166 (NOW), s3 = Timestamp(1637002203, 1) (CLUSTER_TIME) }",
				"stages" : "[2] nlj [] [s9, s5, s6, s7, s8] \n    left \n        [1] nlj [s6, s8] [s10, s11] \n            left \n                [1] project [s6 = \"a_1\", s8 = {\"a\" : 1}, s10 = KS(2B020104), s11 = KS(2B02FE04)] \n                [1] limit 1 \n                [1] coscan \n            right \n                [1] project [s5 = s12] \n                [1] ixseek s10 s11 s7 s9 s12 [] @\"2c41ddb0-4c30-488c-a2dd-11240fa088e8\" @\"a_1\" true \n            \n        \n    right \n        [2] limit 1 \n        [2] seek s9 s13 s14 s5 s6 s7 s8 [] @\"2c41ddb0-4c30-488c-a2dd-11240fa088e8\" true false \n    \n"
			}

Hence, this request is for improving the readability of the explain plans in 5.X and above that use the slot-based execution query engine.



 Comments   
Comment by Irina Yatsenko (Inactive) [ 16/Feb/22 ]

Would it be possible/useful to replace the leading node id ([N]) with the name of the node from the query plan?

Comment by Harshad Dhavale [ 01/Dec/21 ]

Hi david.storch - thanks for the detailed explanation of the current structure for the slotBasedPlan section, how to better read-and-interpret it, and the reasoning for keeping that section in the explain() output! That makes sense, and answers most of the concerns. I understand why there's not much that can be done to change the format of the slotBasedPlan section. I like the idea of providing field names alongside each of the parameters to a stage, while still keeping it in the string format, for enhancing the understanding/readability of that section.

Comment by David Storch [ 01/Dec/21 ]

harshad.dhavale thanks for the additional info! I think it is valuable to display the SBE plan, so I doubt that we would just remove the slotBasedPlan section – for instance, this could help us determine if there is a bug in the code which constructs the slotBasedPlan based on the tree displayed in the queryPlan section.

The SBE plan is arguably more difficult to interpret because its constituent stages and expressions are lower level than their counterparts from the classic query engine, and thus bear less resemblance to the input MQL query. This is a fundamental property of the SBE architecture, and there is not really much we can do to change it. That said, there are probably some tweaks we could make to aid readability. I think one thing that would help a lot is providing field names alongside each of the parameters to a stage. For instance, "seek s9 s13 s14 s5 s6 s7 s8 [] @\"2c41ddb0-4c30-488c-a2dd-11240fa088e8\" true false" could instead be written more like so:

seek recordSlot=s9 recordIdSlot=s13 snapshotIdSlot=s14 indexIdSlot=s5 indexKeySlot=s6 indexKeyPatternSlot=s7 oplogTsSlot=s8 collectionUuid=2c41ddb0-4c30-488c-a2dd-11240fa088e8 forward=true needOplogSlotForTs=false

That at least gives some hint of what the seek stage's parameters are. Another idea is to abandon the string format and always report the SBE plan as JSON; this would result in a bunch more braces, but might aid readability compared the string format.

The other important thing to note that may be useful is the leading "[1]" or "[2]" that comes before each stage in the SBE plan. This is displaying the so-called "planNodeId", and it can be used to correlate the SBE plan with the corresponding tree in the queryPlan section. In this example, the IXSCAN stage has planNodeId=1 and the FETCH stage has planNodeId=2. The SBE subtree which is tagged with planNodeId=1 is implementing the IXSCAN stage:

        [1] nlj [s6, s8] [s10, s11] 
            left 
                [1] project [s6 = \"a_1\", s8 = {\"a\" : 1}, s10 = KS(2B020104), s11 = KS(2B02FE04)] 
                [1] limit 1 
                [1] coscan 
            right 
                [1] project [s5 = s12] 
                [1] ixseek s10 s11 s7 s9 s12 [] @\"2c41ddb0-4c30-488c-a2dd-11240fa088e8\" @\"a_1\" true 

In short, this is an index scan of the index named "a_1" whose bounds are from KeyString value KS(2B020104) to KeyString value KS(2B02FE04). It would definitely be helpful if we provided decoded versions of these KeyString values so that you can understand what the index bounds are, but you can see from the corresponding IXSCAN stage that the bounds are in fact "[1.0, 1.0]".

Similarly, the part of the tree tagged with "[2]" corresponds to the FETCH operation:

"[2] nlj [] [s9, s5, s6, s7, s8] 
    left 
       ...        
    right 
        [2] limit 1 
        [2] seek s9 s13 s14 s5 s6 s7 s8 [] @\"2c41ddb0-4c30-488c-a2dd-11240fa088e8\" true false"

This is essentially just feeding the record id values produced by the underlying index scan to the seek stage, which is SBE's equivalent of FETCH.

Comment by Harshad Dhavale [ 30/Nov/21 ]

Hi david.storch - thanks for looking into this!

While pretty-printing the contents of the "slotBasedPlan" would be nice-to-have, my request was actually more to do with fundamentally changing how the "slotBasedPlan" is represented in the explain() output. Specifically, my concern was that there is no way to decipher the contents of the "slotBasedPlan" section. For instance, when we look at these stages (pretty-printed):

"stages" : 
"[2] nlj [] [s9, s5, s6, s7, s8] 
    left 
        [1] nlj [s6, s8] [s10, s11] 
            left 
                [1] project [s6 = \"a_1\", s8 = {\"a\" : 1}, s10 = KS(2B020104), s11 = KS(2B02FE04)] 
                [1] limit 1 
                [1] coscan 
            right 
                [1] project [s5 = s12] 
                [1] ixseek s10 s11 s7 s9 s12 [] @\"2c41ddb0-4c30-488c-a2dd-11240fa088e8\" @\"a_1\" true             
    right 
        [2] limit 1 
        [2] seek s9 s13 s14 s5 s6 s7 s8 [] @\"2c41ddb0-4c30-488c-a2dd-11240fa088e8\" true false"

I don't know how this info should be interpreted. AFAIK, "slots" refer to "places" where the data lives. However, for instance, it's hard for users to understand what is s11 = KS(2B02FE04), or what is seek s9 s13 s14 s5 s6 s7 s8 [] @\"2c41ddb0-4c30-488c-a2dd-11240fa088e8\" true false", etc., and how each slot and the relevant stages fit in the execution plan. If the representation cannot be changed, then from a user-perspective it doesn't make sense to include the "slotBasedPlan" section in the explain() output, because it cannot be understood anyway.

Comment by David Storch [ 29/Nov/21 ]

harshad.dhavale I'd like to better understand this request. Are you simply asking for the SBE plan string to be pretty-printed so that the tree structure is visible, and newlines appear as actual newlines instead of "\n"? Or are you asking for a fundamental change in how the SBE plan is represented? If the latter, could you be more specific about how the same information could be presented in a better way?

Generated at Thu Feb 08 05:52:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.