[DOCS-13263] Add note to "Multiple Join Conditions with $lookup" on index usage Created: 05/Dec/19  Updated: 30/Oct/23  Due: 04/Sep/20  Resolved: 11/Sep/20

Status: Closed
Project: Documentation
Component/s: manual
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Improvement Priority: Major - P3
Reporter: Shakir Sadikali Assignee: Jason Price
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 3 years, 21 weeks, 6 days ago
Epic Link: DOCSP-1769

 Description   

Description

When using aggregation with $lookup using a pipeline stage:

https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#specify-multiple-join-conditions-with-lookup

 

We should add a note that $expr will only use indexes on the "from" collection for EQUALITY matches ONLY.  In the example, it will only use an index on "warehouses.stock_item".  It's unable to use it for the range component (warehouses.instock).

 

Details here: https://jira.mongodb.org/browse/HELP-11242?focusedCommentId=2407968&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-2407968

 

I bring this to your attention since it can have profound performance implications when using $lookup.

 

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Githook User [ 10/Sep/20 ]

Author:

{'name': 'Jason Price', 'email': 'jason.price@mongodb.com', 'username': 'jason-price-mongodb'}

Message: DOCS-13263 Added details about multiple join conditions with lookup
Branch: v3.6
https://github.com/mongodb/docs/commit/d7fac5513c038bbe3e32b9db221a2971f4c2ee02

Comment by Githook User [ 10/Sep/20 ]

Author:

{'name': 'Jason Price', 'email': 'jason.price@mongodb.com', 'username': 'jason-price-mongodb'}

Message: DOCS-13263 Added details about multiple join conditions with lookup
Branch: v4.0
https://github.com/mongodb/docs/commit/f69f329a4ed1ae13cd3cc1fc8ac8bc5c5c5e23bc

Comment by Githook User [ 10/Sep/20 ]

Author:

{'name': 'Jason Price', 'email': 'jason.price@mongodb.com', 'username': 'jason-price-mongodb'}

Message: DOCS-13263 Added details about multiple join conditions with lookup
Branch: v4.2
https://github.com/mongodb/docs/commit/b09c51197c67b4dc7cdebb5457930075939f3481

Comment by Githook User [ 09/Sep/20 ]

Author:

{'name': 'Jason Price', 'email': 'jason.price@mongodb.com', 'username': 'jason-price-mongodb'}

Message: DOCS-13263 Added details about multiple join conditions with lookup
Branch: master
https://github.com/mongodb/docs/commit/25bfb9eee539ff3b75f4dc914c419d634b1901a6

Comment by Chris Harris [ 01/Sep/20 ]

Hi jason.price,

Thanks for your work on this so far.  

Mainly just dropping by here to mention a current limitation with explain that you are running into.  Namely, explain does not presently say anything about the plans that are used for the $lookup on the from collections (SERVER-22622).  So in your latest examples the aggregation pipelines are executed against the orders collection and they begin with a $lookup stage against the warehouse collection.  Because you are not filtering any data from the original collection ( orders ), the database will need to perform a full collection scan on it.  This is what you are seeing reflected in the explain output.  The output itself says nothing about the execution plan for the $lookup against warehouse though.  You will need to use other strategies to determine if and and how such indexes are used.  A bit of additional information can be found here

Best,
Chris

P.S. Perhaps consider wrapping your sample output in code blocks for better readabillity of the comment?  Eg:

MongoDB Enterprise > db.warehouses.createIndex({stock_item:1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
 
MongoDB Enterprise > db.warehouses.createIndex({instock:1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 2,
	"numIndexesAfter" : 3,
	"ok" : 1
}
 
MongoDB Enterprise > exp =   db.orders.explain().aggregate([
...       {
...          $lookup:
...             {
...               from: "warehouses",
...               let: { order_item: "$item", order_qty: "$ordered" },
...               pipeline: [
...                  { $match: 
...                     { $expr: 
...                        { $and: 
...                           [ 
...                             { $eq: [ "$stock_item",  "$$order_item" ] }
...                           ] 
...                        }
...                     }
...                  },
...                  { $project: { stock_item: 0, _id: 0 } }
...               ],
...               as: "stockdata"
...             }
...        }
...    ])
{
	"stages" : [
		{
			"$cursor" : {
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.orders",
					"indexFilterSet" : false,
					"parsedQuery" : {
						
					},
					"queryHash" : "8B3D4AB8",
					"planCacheKey" : "8B3D4AB8",
					"winningPlan" : {
						"stage" : "COLLSCAN",
						"direction" : "forward"
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$lookup" : {
				"from" : "warehouses",
				"as" : "stockdata",
				"let" : {
					"order_item" : "$item",
					"order_qty" : "$ordered"
				},
				"pipeline" : [
					{
						"$match" : {
							"$expr" : {
								"$and" : [
									{
										"$eq" : [
											"$stock_item",
											"$$order_item"
										]
									}
								]
							}
						}
					},
					{
						"$project" : {
							"stock_item" : 0,
							"_id" : 0
						}
					}
				]
			}
		}
	],
	"serverInfo" : {
		"host" : "Jasons-MBP",
		"port" : 27017,
		"version" : "4.4.0",
		"gitVersion" : "563487e100c4215e2dce98d0af2a6a5a2d67c5cf"
	},
	"ok" : 1
}
 
MongoDB Enterprise > db.warehouses.createIndex({stock_item:1, order_item:1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 3,
	"numIndexesAfter" : 4,
	"ok" : 1
}
 
MongoDB Enterprise > exp = db.orders.explain().aggregate([
...       {
...          $lookup:
...             {
...               from: "warehouses",
...               let: { order_item: "$item", order_qty: "$ordered" },
...               pipeline: [
...                  { $match: 
...                     { $expr: 
...                        { $and: 
...                           [ 
...                             { $eq: [ "$stock_item",  "$$order_item" ] }
...                           ] 
...                        }
...                     }
...                  },
...                  { $project: { stock_item: 0, _id: 0 } }
...               ],
...               as: "stockdata"
...             }
...        }
...    ])
{
	"stages" : [
		{
			"$cursor" : {
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.orders",
					"indexFilterSet" : false,
					"parsedQuery" : {
						
					},
					"queryHash" : "8B3D4AB8",
					"planCacheKey" : "8B3D4AB8",
					"winningPlan" : {
						"stage" : "COLLSCAN",
						"direction" : "forward"
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$lookup" : {
				"from" : "warehouses",
				"as" : "stockdata",
				"let" : {
					"order_item" : "$item",
					"order_qty" : "$ordered"
				},
				"pipeline" : [
					{
						"$match" : {
							"$expr" : {
								"$and" : [
									{
										"$eq" : [
											"$stock_item",
											"$$order_item"
										]
									}
								]
							}
						}
					},
					{
						"$project" : {
							"stock_item" : 0,
							"_id" : 0
						}
					}
				]
			}
		}
	],
	"serverInfo" : {
		"host" : "Jasons-MBP",
		"port" : 27017,
		"version" : "4.4.0",
		"gitVersion" : "563487e100c4215e2dce98d0af2a6a5a2d67c5cf"
	},
	"ok" : 1
}

Generated at Thu Feb 08 08:07:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.