[SERVER-53762] Report aggregate execution stats in explain for the inner side of $lookup Created: 13/Jan/21  Updated: 29/Oct/23  Resolved: 01/Apr/21

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Mohammad Dashti (Inactive)
Resolution: Fixed Votes: 1
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
is documented by DOCS-14453 Investigate changes in SERVER-53762: ... Closed
Related
is related to SERVER-47640 $lookup should increment the serverSt... Closed
is related to SERVER-22622 Improve $lookup explain to indicate q... Needs Scheduling
Backwards Compatibility: Fully Compatible
Sprint: Query Execution 2021-03-08, Query Execution 2021-03-22, Query Execution 2021-04-05
Participants:

 Description   

As part of SERVER-47640, we enhanced the implementation of $lookup to keep track of the PlanSummaryStats associated with the subpipeline. These stats include metrics such as the number of keys examined and the number of documents examined by the subpipeline. Note that the subpipeline may be executed many times, since it is typically executed once per document from the outer (or "local") side. The PlanSummaryStats are tracked in aggregate across all such executions of the subpipeline. So if, for instance, the subpipeline scans a 10 document collection every time it runs, and it runs 5 times, then the result will be a "docsExamined" quantity of 50.

Now that we are tracking these values, it would be valuable for debugging/diagnostics to expose them in "executionStats" or "allPlansExecution" level explain output. We will need to choose which metrics are exposed, but I think it would at the very least include docs examined and keys examined. This would be especially helpful in understanding the performance properties of the subpipeline, since explain currently does not display the execution plan used on the inner side (see SERVER-22622). Any information in explain as to whether the $lookup subpipeline is cheap or expensive and why would be immensely helpful for query performance debugging scenarios.



 Comments   
Comment by jing xu [ 03/Aug/23 ]

hi:
why server 6.0.5 still is null .
totalDocsExamined: Long("0"),
totalKeysExamined: Long("0"),
collectionScans: Long("0"),
indexesUsed: [],
nReturned: Long("0"),
$lookup
{
'$lookup':

{ from: 'xiaoxu', as: 'output', localField: 'waybillno', foreignField: 'waybillno' }

,
totalDocsExamined: Long("0"),
totalKeysExamined: Long("0"),
collectionScans: Long("0"),
indexesUsed: [],
nReturned: Long("0"),
executionTimeMillisEstimate: Long("0")
},
{
'$match': { output: { '$not':

{ '$eq': [] }

} },
nReturned: Long("0"),
executionTimeMillisEstimate: Long("0")
}

Comment by Mohammad Dashti (Inactive) [ 01/Apr/21 ]

Thanks to andrew.paroski and david.storch for their detailed comments and code reviews, this issue is fixed via this commit.

Comment by Githook User [ 01/Apr/21 ]

Author:

{'name': 'Mohammad Dashti', 'email': 'mdashti@gmail.com', 'username': 'mdashti'}

Message: SERVER-53762 Report aggregate execution stats in `explain` for the inner side of `$lookup` (for `explain` verbosity levels equal to or higher than `executionStats`)
Branch: master
https://github.com/mongodb/mongo/commit/58b2dea3e087b5b4c23a4cb9dfd8687f9309e272

Comment by Mohammad Dashti (Inactive) [ 25/Mar/21 ]

Talked to david.storch earlier today. Thanks to david.storch we could clarify some points about this issue:

Let's take a running example from this test:

localColl.aggregate([
        {
            $lookup: {
                from: fromColl.getName(),
                localField: "localField",
                foreignField: "foreignField",
                as: "output"
            }
        },
        {
            $sort: {localField: 1}
        }])

we can enable explain on it via adding explain("executionStats") or explain("allPlansExecution"):

localColl.explain("executionStats").aggregate([
        {
            $lookup: {
                from: fromColl.getName(),
                localField: "localField",
                foreignField: "foreignField",
                as: "output"
            }
        },
        {
            $sort: {localField: 1}
        }])

Then, it will produce this explain output:

{
	"explainVersion" : "1",
	"stages" : [
		{
			"$cursor" : {
				"queryPlanner" : {
					"namespace" : "lookup_query_stats.local",
					"indexFilterSet" : false,
					"parsedQuery" : {
						
					},
					"queryHash" : "DB1486E5",
					"planCacheKey" : "DB1486E5",
					"maxIndexedOrSolutionsReached" : false,
					"maxIndexedAndSolutionsReached" : false,
					"maxScansToExplodeReached" : false,
					"winningPlan" : {
						"stage" : "SORT",
						"sortPattern" : {
							"localField" : 1
						},
						"memLimit" : 104857600,
						"type" : "simple",
						"inputStage" : {
							"stage" : "COLLSCAN",
							"direction" : "forward"
						}
					},
					"rejectedPlans" : [ ]
				},
				"executionStats" : {
					"executionSuccess" : true,
					"nReturned" : 2,
					"executionTimeMillis" : 5,
					"totalKeysExamined" : 0,
					"totalDocsExamined" : 2,
					"executionStages" : {
						"stage" : "SORT",
						"nReturned" : 2,
						"executionTimeMillisEstimate" : 0,
						"works" : 7,
						"advanced" : 2,
						"needTime" : 4,
						"needYield" : 0,
						"saveState" : 1,
						"restoreState" : 1,
						"isEOF" : 1,
						"sortPattern" : {
							"localField" : 1
						},
						"memLimit" : 104857600,
						"type" : "simple",
						"totalDataSizeSorted" : 140,
						"usedDisk" : false,
						"inputStage" : {
							"stage" : "COLLSCAN",
							"nReturned" : 2,
							"executionTimeMillisEstimate" : 0,
							"works" : 4,
							"advanced" : 2,
							"needTime" : 1,
							"needYield" : 0,
							"saveState" : 1,
							"restoreState" : 1,
							"isEOF" : 1,
							"direction" : "forward",
							"docsExamined" : 2
						}
					}
				}
			},
			"nReturned" : NumberLong(2),
			"executionTimeMillisEstimate" : NumberLong(0)
		},
		{
			"$lookup" : {
				"from" : "foreign",
				"as" : "output",
				"localField" : "localField",
				"foreignField" : "foreignField"
			},
			"nReturned" : NumberLong(2),
			"executionTimeMillisEstimate" : NumberLong(3)
		}
	],
	"serverInfo" : {
		"host" : "ip-10-122-9-106",
		"port" : 27017,
		"version" : "4.9.0-alpha7",
		"gitVersion" : "unknown"
	},
	"serverParameters" : {
		"internalQueryFacetBufferSizeBytes" : 104857600,
		"internalQueryFacetMaxOutputDocSizeBytes" : 104857600,
		"internalLookupStageIntermediateDocumentMaxSizeBytes" : 104857600,
		"internalDocumentSourceGroupMaxMemoryBytes" : 104857600,
		"internalQueryMaxBlockingSortMemoryUsageBytes" : 104857600,
		"internalQueryProhibitBlockingMergeOnMongoS" : 0,
		"internalQueryMaxAddToSetBytes" : 104857600,
		"internalDocumentSourceSetWindowFieldsMaxMemoryBytes" : 104857600
	},
	"command" : {
		"aggregate" : "local",
		"pipeline" : [
			{
				"$lookup" : {
					"from" : "foreign",
					"localField" : "localField",
					"foreignField" : "foreignField",
					"as" : "output"
				}
			},
			{
				"$sort" : {
					"localField" : 1
				}
			}
		],
		"cursor" : {
			
		},
		"$db" : "lookup_query_stats"
	},
	"ok" : 1
}

In this issue, we want to extend the stages[1] path (where $lookup stats are located) and add the following statistics from PlanSummaryStats:

  • totalDocsExamined
  • totalKeysExamined
  • collectionScans
  • indexesUsed

Then, the stages[1] path in the above example will look like this:

		{
			"$lookup" : {
				"from" : "foreign",
				"as" : "output",
				"localField" : "localField",
				"foreignField" : "foreignField"
			},
			"nReturned" : NumberLong(2),
			"executionTimeMillisEstimate" : NumberLong(3),
			"totalDocsExamined" : NumberLong(2),
			"totalKeysExamined" : NumberLong(2),
			"collectionScans" : NumberLong(5),
			"indexesUsed" : ["IDX1", "IDX2"]
		}

Comment by Alex Bevilacqua [ 15/Jan/21 ]

david.storch SERVER-22622 covers what I'm looking for pretty well so a new ticket shouldn't be necessary. For this ticket the aggregate docs/keys scanned would still be an improvement.

Comment by David Storch [ 15/Jan/21 ]

alex.bevilacqua we could definitely implement something like you suggest, but I would recommend that be tracked under a separate ticket. The work proposed here would be a very simple and natural extension of work already done under SERVER-47640, so it is something that we could accomplish as a quick win. Your proposal would require a bit more effort, and would maybe be better done via a more holistic solution such as SERVER-22622. Do you want to file a separate ticket to track your idea?

Comment by Alex Bevilacqua [ 15/Jan/21 ]

david.storch this is a great idea. Since each execution of the subpipeline could potentially use a difference plan, and each plan can potentially scan more than one index, maybe the keys/docs examined could be per-index for the $lookup?

Comment by David Storch [ 13/Jan/21 ]

christopher.harris alex.bevilacqua I assume this is something that you would find quite useful, but let me know if you have any further thoughts on priority or regarding the behavior proposed here.

Generated at Thu Feb 08 05:31:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.