[SERVER-28140] $lookup & $match performance improvement Created: 28/Feb/17  Updated: 05/Jun/17  Resolved: 06/Mar/17

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: 3.4.2
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: angus Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

I am trying to use mongoDb aggregation pipeline to lookup other table and check if matched some of the attributes.

my query is something like below (p.s. bigtable is sharded):

 db.bigtable.aggregate( [  
    {$lookup:{ from: "user", localField:
               "user_code", foreignField: "code",as: "user"}},   
    {$match: {"user.country":"US" } }] )

I have created index for bigtable.user_code, user.code & user.country. But the query is very very very slow. I tried explain it, it is COLLSCAN.

It seems to me what using $match with foreign table's attribute, index is not supported and therefore, the query speed is very slow.



 Comments   
Comment by QiXinLi [X] [ 05/Jun/17 ]

i have the same issue like this on 3.4.4,anyone have some solution?

Comment by Charlie Swanson [ 06/Mar/17 ]

Hi all,

Two notes:

  1. The query on the 'bigtable' collection is indeed using a collection scan, since there isn't any predicate on that collection. The query on the foreign collection ('user') may well be using an index. There's no great way to tell at the moment, but coming in 3.4.3 you could use $indexStats to determine if the $lookup is using an index (waiting on the fix for SERVER-27253). In general, if the 'user' collection has an index beginning with 'code', I would expect it to choose that index. Currently the $lookup stage invokes the planner on every lookup, so it will behave as if it's executing the query {code: {$eq: <value of user_code>}}. Unfortunately, the plan chosen may depend on the value used, but usually it doesn't. So you can use an explain on a query of this shape to make sure it's using an index.
  2. If you want to ensure that the query on the foreign collection is also incorporating the predicate on 'user.country', you'll need to add an $unwind between the $lookup and the $match. Without an $unwind, the semantics are slightly different, since the document {user: [{country: "US"}, {country: "UK"}]} will match the predicate, but the document {user: {country: "UK"}} will not.

    > db.bigtable.explain().aggregate( [
         {$lookup:{ from: "user", localField: "user_code", foreignField: "code",as: "user"}},
         {$match: {"user.country":"US" } }
    ] )
    {
    	"stages" : [
    		{
    			"$cursor" : {
    				"query" : {
    					
    				},
    				"queryPlanner" : {
    					"plannerVersion" : 1,
    					"namespace" : "test.bigtable",
    					"indexFilterSet" : false,
    					"parsedQuery" : {
    						
    					},
    					"winningPlan" : {
    						"stage" : "COLLSCAN",
    						"direction" : "forward"
    					},
    					"rejectedPlans" : [ ]
    				}
    			}
    		},
    		{
    			"$lookup" : {
    				"from" : "user",
    				"as" : "user",
    				"localField" : "user_code",
    				"foreignField" : "code"
    			}
    		},
    		{
    			"$match" : {
    				"user.country" : "US"
    			}
    		}
    	],
    	"ok" : 1
    }
    > db.bigtable.explain().aggregate( [
         {$lookup:{ from: "user", localField: "user_code", foreignField: "code",as: "user"}},
         {$unwind: "$user"},
         {$match: {"user.country":"US" } }
    ] )
    {
    	"stages" : [
    		{
    			"$cursor" : {
    				"query" : {
    					
    				},
    				"queryPlanner" : {
    					"plannerVersion" : 1,
    					"namespace" : "test.bigtable",
    					"indexFilterSet" : false,
    					"parsedQuery" : {
    						
    					},
    					"winningPlan" : {
    						"stage" : "COLLSCAN",
    						"direction" : "forward"
    					},
    					"rejectedPlans" : [ ]
    				}
    			}
    		},
    		{
    			"$lookup" : {
    				"from" : "user",
    				"as" : "user",
    				"localField" : "user_code",
    				"foreignField" : "code",
    				"unwinding" : {
    					"preserveNullAndEmptyArrays" : false
    				},
    				"matching" : {
    					"country" : {
    						"$eq" : "US"
    					}
    				}
    			}
    		}
    	],
    	"ok" : 1
    }
    

Generated at Thu Feb 08 04:17:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.